Parse Geo URI in Java

2

I need to parse a URI of type geo.

Samples:

  • geo:79.786971,-124.399677
  • geo:42.374260,-71.120824?z=16
  • For the first sample I have the regular expression to filter that its structure is correct:

    ^geo:.(\-?\d+(\.\d+)?),\s*(\-?\d+(\.\d+)?)$
    

    For the second that there is the addition of ?=z I do not leave. The zoom values are integers from 0 to 99.

    To separate the data, as I only need the numerical values, use:

    s = "geo:79.786971,-124.399677?z=16";
    Pattern p = Pattern.compile("[-+]?[0-9]*\.?[0-9]+");
    Matcher m = p.matcher(s);
    
    while(m.find()){
        System.out.println(">> "+ m.group());
    }
    

    Your result:

    79.786971
    -124.399677
    16
    

    Summing up:

    • Detect if it is a geo:lat,lon or geo:lat,lon?z=16
    • get variables lat lon and, if there is, ?z= then zoom
    asked by Webserveis 15.11.2017 в 19:32
    source

    1 answer

    3

    Regular expression

    You can validate and capture everything with a single regex:

    ^geo:([-+]?\d+(?:\.\d+)?),([-+]?\d+(?:\.\d+)?)(?:\?z=(\d{1,2}))?$
    

    Demo on RegexPlanet.com


    Description

    The only important concept to understand here are the groups.
    In regex, the simple parentheses (also called capturing groups ) have 2 functions: group and capture.

  • Group.
    They function as a grouper of a subpattern. And so, they allow to separate constructions or repeat a specific part.

      

    For example, (subpatrón){3} serves to repeat the subpattern 3 times.

  • Capture.
    The text that a group matches is captured and returned after the match. In Java, the captures of each group are obtained with Matcher.group(int group) .

    After calling Matcher.find() , the text captured by the first set of parentheses is obtained with Matcher.group(1) , that of the second group with Matcher.group(2) , and so on. In addition, the special case of Matcher.group(0) (or Matcher.group() without passing the parameter) returns the match of the entire regular expression.

      

    For example, if we use the regex desde (\d+) hasta (\d+) and match, Matcher.group() will contain all the text with which it coincided, and we will obtain each of the 2 numbers in Matcher.group(1) and Matcher.group(2) .

    Groups are always counted from left to right.
    The N group will be the N-th% ( open in the regex.


  • On the other hand, there are also groups without capture , whose syntax is (?:subpatrón)
    They have the same grouping function as the simple parentheses, but in this case they do not capture the text.

    •   

      For example, the regex desde (\d+(?:\.\d+)?) hasta (\d+(?:\.\d+)?) us   allows you to have the two numbers with the optional decimal part (because the decimal part is   grouped and repeated 0 or 1 time with ? ). However, we still have only 2 groups   captors and each number is returned with Matcher.group(1) and Matcher.group(2) .

    A good practice is to use groups without capture whenever you do not need to get that part of the text, to avoid using unnecessary memory. Also, it is easier for those who are reading your regex, because it is clear that you are grouping but you are not interested in recovering that part.
    - I know, at the beginning it costs to read those ? and : more, which is not a quantifier, but I assure you that in a short time it is read continuously and understands automatically .


    Once this is understood, it is very easy to interpret the regex of this answer.

    ^geo:([-+]?\d+(?:\.\d+)?),([-+]?\d+(?:\.\d+)?)(?:\?z=(\d{1,2}))?$

    • ^ ::: Matches the initial position of the string.

    • geo: ::: Literal.

    • ([-+]?\d+(?:\.\d+)?) ::: Group 1 - A real number.

      • [-+]? ::: sign - or + optional
      • \d+ ::: whole part
      • (?:\.\d+)? ::: optional decimal part (it is grouped and with the quantifier ? ).
    • , ::: Literal.

    • ([-+]?\d+(?:\.\d+)?) ::: Group 2 - Another real.

    • (?:\?z=(\d{1,2}))? ::: group without optional capture - the zoom.

      • \?z= ::: literal ?z=
      • (\d{1,2}) - Group 3 - Integer between 0 and 99

          

        But since this group is inside another optional group, when it does not zoom, Matcher.group(3) will return null .

    • $ ::: Matches the final position of the string.


    Code

    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    final String  s = "geo:42.374260,-71.120824?z=16";
    final String  r = "^geo:([-+]?\d+(?:\.\d+)?),([-+]?\d+(?:\.\d+)?)(?:\?z=(\d{1,2}))?$";
    final Pattern p = Pattern.compile(r);
    final Matcher m = p.matcher(s);
    
    if (m.find()) // Coincide con el regex
    {
        String lat  = m.group(1),
               lon  = m.group(2),
               zoom = m.group(3);
    
        if (zoom == null) // Si no tiene zoom, no captura el 3er grupo del regex
        {
            zoom = "no tiene";
        }
    
        System.out.format(
            "URI:      %s%nLatitud:  %s%nLongitud: %s%nZoom:     %s%n%n",
            s, lat, lon, zoom
        );
    }
    else
    {
        System.out.format("URI:      %s%n no es una geo URI%n%n", s);
    }
    

    Demo at ideone.com

        
    answered by 15.11.2017 / 19:59
    source