Parse Geo URI in Java

Question

Parse Geo URI in Java

Navigation

#1 by (3 votes)

2

I need to parse a URI of type geo.

Samples:

geo:79.786971,-124.399677

geo:42.374260,-71.120824?z=16

For the first sample I have the regular expression to filter that its structure is correct:

^geo:.(\-?\d+(\.\d+)?),\s*(\-?\d+(\.\d+)?)$

For the second that there is the addition of ?=z I do not leave. The zoom values are integers from 0 to 99.

To separate the data, as I only need the numerical values, use:

s = "geo:79.786971,-124.399677?z=16";
Pattern p = Pattern.compile("[-+]?[0-9]*\.?[0-9]+");
Matcher m = p.matcher(s);

while(m.find()){
    System.out.println(">> "+ m.group());
}

Your result:

79.786971
-124.399677
16

Summing up:

Detect if it is a geo:lat,lon or geo:lat,lon?z=16
get variables lat lon and, if there is, ?z= then zoom

java regex

asked by Webserveis 15.11.2017 в 18:32

source

1 answer

How can I compare the output of a type () function with a string? Convert variable to Barcode

score 3 · Accepted Answer

Regular expression

You can validate and capture everything with a single regex:

^geo:([-+]?\d+(?:\.\d+)?),([-+]?\d+(?:\.\d+)?)(?:\?z=(\d{1,2}))?$

Demo on RegexPlanet.com

Description

The only important concept to understand here are the groups.
In regex, the simple parentheses (also called capturing groups ) have 2 functions: group and capture.

Group.
They function as a grouper of a subpattern. And so, they allow to separate constructions or repeat a specific part.

For example, (subpatrón){3} serves to repeat the subpattern 3 times.

Capture.
The text that a group matches is captured and returned after the match. In Java, the captures of each group are obtained with Matcher.group(int group) .

After calling Matcher.find() , the text captured by the first set of parentheses is obtained with Matcher.group(1) , that of the second group with Matcher.group(2) , and so on. In addition, the special case of Matcher.group(0) (or Matcher.group() without passing the parameter) returns the match of the entire regular expression.

For example, if we use the regex desde (\d+) hasta (\d+) and match, Matcher.group() will contain all the text with which it coincided, and we will obtain each of the 2 numbers in Matcher.group(1) and Matcher.group(2) .

Groups are always counted from left to right.
The N group will be the N-th% ( open in the regex.

On the other hand, there are also groups without capture , whose syntax is (?:subpatrón)
They have the same grouping function as the simple parentheses, but in this case they do not capture the text.

For example, the regex desde (\d+(?:\.\d+)?) hasta (\d+(?:\.\d+)?) us allows you to have the two numbers with the optional decimal part (because the decimal part is grouped and repeated 0 or 1 time with ? ). However, we still have only 2 groups captors and each number is returned with Matcher.group(1) and Matcher.group(2) .

A good practice is to use groups without capture whenever you do not need to get that part of the text, to avoid using unnecessary memory. Also, it is easier for those who are reading your regex, because it is clear that you are grouping but you are not interested in recovering that part.
- I know, at the beginning it costs to read those ? and : more, which is not a quantifier, but I assure you that in a short time it is read continuously and understands automatically .

Once this is understood, it is very easy to interpret the regex of this answer.

^geo:([-+]?\d+(?:\.\d+)?),([-+]?\d+(?:\.\d+)?)(?:\?z=(\d{1,2}))?$

^ ::: Matches the initial position of the string.
geo: ::: Literal.
([-+]?\d+(?:\.\d+)?) ::: Group 1 - A real number.
- [-+]? ::: sign - or + optional
- \d+ ::: whole part
- (?:\.\d+)? ::: optional decimal part (it is grouped and with the quantifier ? ).
, ::: Literal.
([-+]?\d+(?:\.\d+)?) ::: Group 2 - Another real.
(?:\?z=(\d{1,2}))? ::: group without optional capture - the zoom.
- \?z= ::: literal ?z=
- (\d{1,2}) - Group 3 - Integer between 0 and 99
  
  But since this group is inside another optional group, when it does not zoom, Matcher.group(3) will return null .
$ ::: Matches the final position of the string.

Code

import java.util.regex.Matcher;
import java.util.regex.Pattern;

final String  s = "geo:42.374260,-71.120824?z=16";
final String  r = "^geo:([-+]?\d+(?:\.\d+)?),([-+]?\d+(?:\.\d+)?)(?:\?z=(\d{1,2}))?$";
final Pattern p = Pattern.compile(r);
final Matcher m = p.matcher(s);

if (m.find()) // Coincide con el regex
{
    String lat  = m.group(1),
           lon  = m.group(2),
           zoom = m.group(3);

    if (zoom == null) // Si no tiene zoom, no captura el 3er grupo del regex
    {
        zoom = "no tiene";
    }

    System.out.format(
        "URI:      %s%nLatitud:  %s%nLongitud: %s%nZoom:     %s%n%n",
        s, lat, lon, zoom
    );
}
else
{
    System.out.format("URI:      %s%n no es una geo URI%n%n", s);
}

Demo at ideone.com