I captured a non-capture group using ((?:_\d+)+)
, and in the regex101 site it works for all the languages that up to now are
And it will work for you in any dialect of regex.
- All except BREs, POSIX EREs or Oracle to be exact, since they do not support groups without capture:
(?:
... )
.
I'm getting the behavior of
/([a-zA-Z]+)_([a-zA-Z]+_?[a-zA-Z]+?)((?:_\d+)+(?:_\d+)*)/
using
/([a-zA-Z]+)_([a-zA-Z]+_?[a-zA-Z]+?)((?:_\d+)+)/
In fact, using the first form would be an error, since you are unnecessarily repeating the (?:_\d+)*
of the end, which will never coincide with anything, because the previous construction ( (?:_\d+)+
), already consumed all that had , leaving nothing for the last one.
It can be corroborated with an example, adding one more group around the last (?:_\d+)*
.
const texto = '_123_456_789_0',
regex = /((?:_\d+)+((?:_\d+)*))/;
[match, grupo1, grupo2] = regex.exec(texto);
console.log('Grupo 1: "${grupo1}"');
console.log('El último '(?:_\d+)*' coincidió con: "${grupo2}"');
I would like someone to explain to me why it worked to use a double capture
You are not using a double capture. In ((?:_\d+)+)
, only the outer group is the one that captures. And just (?:
... )
is a group without capture .
A structure such as ((?:_\d+)+)
is perfectly normal and is used frequently. Think of it this way: it is the same as (\d+)
, only that what is repeated in ((?:_\d+)+)
are not only digits but underscores followed by digits.
Nesting groups (with or without capture) is as valid as, and practically the same as, using nested loops in your code ... Simple as that.
What implications (positive or negative) does a catch have of catching a group like I did not.
None. Neither positive nor negative. You would not have achieved the same result without nesting a group without capture within one with capture that way ... Again, it's a completely normal structure.
In fact, as a general rule, you should always use groups without capture (?:
... )
when you do not need to get the text with which it matched. A group without capture does not occupy unnecessary memory (neither in capturing the text nor in generating the indices of the initial and final positions).
- If you are interested in entering much more in detail, a group without capture is just slower to compile, but more efficient when executing. However, this difference is negligible, and people usually prefer to save memory (it is better seen from the point of view of good practices).
Yapa, one more correction. Use a structure such as:
([a-zA-Z]+_?[a-zA-Z]+?)
is an error. You are consecutively repeating 2 constructions that match the same. Since the _
is optional, the regex can be converted to [a-zA-Z]+[a-zA-Z]+?
, and such a construction is the perfect recipe for a backtracking catastrophic .
This is a problem that will not generate an error in the cases you are looking at, but with a slightly more complicated regex, longer texts and a condition that does not match, could cause the browser to freeze without returning a result.
Let's see a test, not so drastic, but obvious enough:
const regex = /^([a-zA-Z]+)_([a-zA-Z]+_?[a-zA-Z]+?)((?:_\d+)+)$/,
N = 1000,
texto = 'X_'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'
+ '_1_2_ERROR';
//Tu regex
let a, b, resultado;
a = performance.now()
for (let i = 0; i < N; i++) {
resultado = regex.exec(texto);
}
b = performance.now();
console.log('"([a-zA-Z]+_?[a-zA-Z]+?)" Tardó:', (b - a), 'ms. en devolver:', resultado);
//Con un grupo sin captura anidado
const regexConGrupo = /^([a-zA-Z]+)_([a-zA-Z]+(?:_[a-zA-Z]+)?)((?:_\d+)+)$/;
a = performance.now()
for (let i = 0; i < N; i++) {
resultado = regexConGrupo.exec(texto);
}
b = performance.now();
console.log('"([a-zA-Z]+(?:_[a-zA-Z]+)?)" Tardó:', (b - a), 'ms. en devolver:', resultado);
And this, if it were part of a more complicated regex could bring you serious problems.
Also, when using ([a-zA-Z]+_?[a-zA-Z]+?)
, you're demanding that it have at least 2 characters, so it would not match something like A_B_1
.