Multiple groups, all optional, differentiating each one in the result
Everything you mentioned in the question makes sense, and it is a good analysis of the problem. But it can be dealt with in an easier way. Instead of looking to coincide with the two numbers in a single match, it is convenient to think of independent coincidences:
telefono\D*(\d+)|numero favorito\D*(\d+)
In this way, in each match, look for one or the other, and return a match for group 1 or group 2 according to which corresponds. We call Matcher # find () while keep matching:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "telefono\D*(\d+)|numero favorito\D*(\d+)";
final String texto = "hola mi telefono es 12345678 y mi numero favorito es el 13";
final Matcher matcher = Pattern.compile(regex).matcher(texto);
while (matcher.find()) {
if (matcher.group(1) != null) {
System.out.println("Tel: " + matcher.group(1));
} else {
System.out.println("Num: " + matcher.group(2));
}
}
Anyway, I know that your question points more to theory than practice. If the groups have to appear in that order in the text, being equally optional, then the way to capture them would be by adding the intermediate text ( .*
) within the optional part. That is:
^(?:.*telefono\D*(\d+))?(?:.*numero favorito\D*(\d+))?
Recall that the engine of regex is goloso ( greedy ), so for each quantifier, always try to match as much as possible. In this case it means that the (?:
... )?
tries with 1 before 0 ... With that we guarantee to go through the whole string until we find a match (for example in .*telefono\D*(\d+)
), and just take it as optional if that part does not match.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "^(?:.*telefono\D*(\d+))?(?:.*numero favorito\D*(\d+))?";
final String texto = "hola mi telefono es 12345678 y mi numero favorito es el 13";
final Matcher matcher = Pattern.compile(regex).matcher(texto);
if (matcher.find()) { // ← if redundante (siempre coincide)
if (matcher.group(1) != null) {
System.out.println("Tel: " + matcher.group(1));
}
if (matcher.group(2) != null) {
System.out.println("Num: " + matcher.group(2));
}
}
If they can be presented in any order, it is only necessary to replace the groups without capture (?:
... )?
by positive surveys ( lookaheads ) (?=
... )?
.
Another way to have multiple optional groups in order is:
(?:telefono\D*(\d+).*?)?(?:numero favorito\D*(\d+).*?)?$
What it does is that if it matches one of the groups, it continues to consume as little as possible ( not greedy , lazy ) with .*?
until the next group, but at the same time I am forcing it to go through the whole string until it coincides with the end $
.