Regex Pattern doubt java

0

I have the following code:

public class getTag2Html {

    public static void main (String[] args) throws IOException {
        String tag = "<p>";

                URL web = new URL("http://www.insalfonscostafreda.cat/web/");
        //comprobar que hi hagi dos paràmetres a l'entrada

                System.out.println("Busquem a : "+web + " l'etiqueta p");
            // patró de cerca regexp
            String pattern = "<" + tag + ".*\/?>";

            //Iniciem la connexió
        web.openConnection();

        BufferedReader in = new BufferedReader( new InputStreamReader(web.openStream()));

        File f = new File("eac2.xml");
        BufferedWriter bw;
        bw = new BufferedWriter(new FileWriter(f));
        String inputLine;
        while ((inputLine = in.readLine()) != null) {

            if(inputLine.contains("<p>")) {

                System.out.println(inputLine);
                bw.write(inputLine + "\n");
            }
        }
        bw.close();
        in.close(); 
    }
}

I do not understand how to use the part:

String pattern = "<" + tag + ".*\/?>";

The program works perfectly for me but I have to use import

java.util.regex.Matcher;
import java.util.regex.Pattern;

The topic is I need to give you two arguments (web page and for example

). and show me the information on the screen.

He does it perfectly, but I do not know how to modify it to be using regex.

Right now the program shows the web page that I gave it as an argument but it only shows the data that are in < p >.

But the statement includes the code part String pattern="<" + tag + ". * \ /? >"; I do not know how to use in my program.

can you help me?

thanks

    
asked by Montse Mkd 28.03.2018 в 19:45
source

1 answer

1

In summary of what I understood in the chat, what is requested is unusual:

  • Read line by line the html content of the URL passed by parameter
  • For each line:
    • If there is a match with the regular expression <TAG.*\/?> where TAG can be any html tag passed by parameter, then:
    • Print the complete line where the match occurs.

The mistake you are making to use the regular expression, is to think that TAG can be <p> instead of just p , the expression already contains the symbols < and > , so TAG only it must be the name of the tag.

Then, to solve something like this you can effectively use Pattern and Matcher , these classes are used in the following way:

Pattern pattern = Pattern.compile(expresion);

Where expression is the regular expression to use as String of java. You only compile the expression 1 time, since the expression is one.

Then, to find matches in a text, it is used:

Matcher matcher = pattern.matcher(texto);

With that you can consult the matcher for the coincidences in each line, in the following way:

  • matcher.matches() : return true if the text completely complies with the regular expression.
  • matcher.find() : return true if you find some match in the text.

More things happen when you invoke those methods, but for your particular problem, that's enough.

Then, instead of using:

if(inputLine.contains("<p>")) { ... }

Use:

Matcher matcher = pattern.matcher(inputLine);
if(matcher.find()) { ... }

Now, is it necessary to use Pattern and Matcher for something simple like that?

The answer is no. But if it is more efficient than using for example:

if(inputLine.matches(".*<TAG.*\/?>.*")){ ... }

Since the above requires that the regular expression be compiled in each line.

    
answered by 28.03.2018 / 21:52
source