You should not use regular expressions to process HTML. Just a small change in the HTML would make your regex fail. A space of more, a change in the attributes of the tag, a comment, or more complex structures, would make even a gigantic regex not follow the rules.
It's very easy to process HTML with DOM , they are the tools that They are designed for that.
The DOM is simply generated as follows:
$html = '<p><figure><img src="xxxx"></figure></p>';
//Generar el DOM
$dom = new DOMDocument;
$libxml_opciones = LIBXML_COMPACT | LIBXML_HTML_NODEFDTD | LIBXML_NONET | LIBXML_NOERROR;
@$dom->loadHTML($html, $libxml_opciones);
//Generar un XPath para búsquedas
$xpath = new DOMXpath($dom);
And you get all the <p>
with:
$p_nodelist = $dom->getElementsByTagName('p');
Then, we go through them in a loop (in reverse order to preserve the structure):
for ($i = $p_nodelist->length; --$i >= 0; ) {
$p = $p_nodelist->item($i);
Seeing if each <p>
has a single child, and that child is a <figure>
:
$p_hijos = $xpath->query('./*',$p);
if ($p_hijos->length == 1 && $p_hijos->item(0)->tagName == 'figure') {
... doing the same with each <figure>
, if you have a single child <img>
.
If the conditions are verified, we replace the <p>
with the <figure>
son:
$p->parentNode->replaceChild($figure,$p);
Finally, we return the DOM to a string:
$resultado = '';
foreach ($dom->documentElement->lastChild->childNodes as $elem) {
$resultado .= $dom->saveHTML($elem);
}
Code:
<?php
$html = '<p><figure><img src="xxxx"></figure></p>';
//Generar el DOM
$dom = new DOMDocument;
$libxml_opciones = LIBXML_COMPACT | LIBXML_HTML_NODEFDTD | LIBXML_NONET | LIBXML_NOERROR;
@$dom->loadHTML($html, $libxml_opciones);
//Generar un XPath para búsquedas
$xpath = new DOMXpath($dom);
//Obtener todos los tags <p>
$p_nodelist = $dom->getElementsByTagName('p');
//Bucle para cada <p> (en orden inverso para preservar la estructura)
for ($i = $p_nodelist->length; --$i >= 0; ) {
$p = $p_nodelist->item($i);
$p_hijos = $xpath->query('./*',$p);
//Si el <p> tiene un único hijo, y ese hijo es un <figure>
if ($p_hijos->length == 1 && $p_hijos->item(0)->tagName == 'figure') {
$figure = $p_hijos->item(0);
//Si el <figure> tiene un único hijo, y ese hijo es una <img>
$figure_hijos = $xpath->query('./*',$figure);
if ($figure_hijos->length == 1 && $figure_hijos->item(0)->tagName == 'img') {
//REEMPLAZAR todo el <p> por solamente el <figure>
$p->parentNode->replaceChild($figure,$p);
}
}
}
//Guardar el html
$resultado = '';
foreach ($dom->documentElement->lastChild->childNodes as $elem) {
$resultado .= $dom->saveHTML($elem);
}
//Imprimir el resultado
echo $resultado;
Result:
<figure><img src="xxxx"></figure>
Demo:
link