What would be the regular expression that helps me to capture each section of "Juan", assuming that each section is variable and the only thing that you have to identify them is your code, in addition to the headings "Header 1"
To match the lines between the Header
and the name, you have to consume all the lines that follow it, as long as the beginning of this line is not followed by a Header
:
^Header 1(?:\R(?!Header ).*+)*?
And after the name, which coincides with the same, all the lines that may be within the same section: regex101
/^Header 1$(?:\R(?!Header ).*+)*?\RNombre : Juan$(?:\R(?!Header ).*+)*/mi
Logic
Subpatrón
Descripción
^Header 1$
Línea completa que coincide con "Header 1"
(modificador /m para que ^ y $ sean inicio/fin de línea)
(?: )*?
Es un grupo que repite el subpatrón cero o más veces:
\R(?!Header ).*+
Un salto de línea, que no esté seguido por "Header ",
y coincide con toda la línea
\RNombre : Juan$
Una línea completa que coincide con el nombre buscado
(?:\R(?!Header ).*+)*
Más líneas que no empiezan con "Header "
The important thing here is that for each line break \R
, we are using a negative inspection ( negative lookahead ) to ensure that it is not followed by a new section:
\R(?!Header )
.
This structure is used to find a match, but returns true or false, without advancing the pointer of the current position. A negative inspection (?!
... )
matches only when the current position no is followed by the pattern within the inspection. PHP calls it - bad! - statements .
Search for "code" instead of name
If instead of by name, we search for code, just replace \RNombre : Juan$
with the pattern that interests you. For example,
-
if we search for the code to be Codigo : xxxx
or xxxx
exclusively :
\R(?:Codigo : )?c001$
-
or that appears on any line at the end, we use \b
to guarantee that it is a complete word:
\R.*\bc001$
-
or anywhere on the line, regardless of whether it is part of another code such as abc00123
:
\R.*c001.*+
Example:
/^Header 1$(?:\R(?!Header ).*+)*?\R(?:Codigo : )?c001$(?:\R(?!Header ).*+)*/mi
Code:
To find all matches we use preg_match_all () .
$regex = '/^Header 1$(?:\R(?!Header ).*+)*?\RNombre : Juan$(?:\R(?!Header ).*+)*/mi';
if (preg_match_all($regex, $texto, $resultado)) {
//mostrar secciones
$n = 0;
foreach ($resultado[0] as &$seccion) {
echo "\n-----Seccion " . ++$n . "-----\n";
echo $seccion;
}
} else {
echo "No se encontró el nombre";
}
Result:
-----Seccion 1-----
Header 1
Codigo : c001
Nombre : Juan
Total : 45,78
... etc (las 3 secciones)
Demo:
link