You should not use regular expressions to process HTML. At the level that you are raising your expression, just a small change in the HTML would make your regex fail. A space of more, a change in the attributes of the tag, a comment, or more complex structures, would make even a gigantic regex not follow the rules. Even with a very advanced expression, an almost fail-safe case could be generated, but you could almost always find a weird case that causes it to fail. Also, it would require an expert each time you want to modify it.
It's very easy to process HTML with DOM , they are the tools that They are designed for that.
If we have an HTML like the following:
//Un HTML de ejemplo
$html = '
<a href="https://i.stack.imgur.com/mOJ0a.png">
<span>Enlace a la misma URL de la imagen</span>
<img src="https://i.stack.imgur.com/mOJ0a.png" />
</a>
<span>Imagen independiente precedida por un </span>
<a href="https://i.stack.imgur.com/mOJ0a.png">enlace</a>
<img src="https://i.stack.imgur.com/mOJ0a.png" />
<a href="./">
<span>Enlace a una URL diferente que la imagen</span>
<img src="https://i.stack.imgur.com/mOJ0a.png" />
</a>
';
The DOM is simply generated as follows:
//Englobamos en body
$html = "<body>$html</body>";
//Generar el DOM
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_COMPACT | LIBXML_HTML_NOIMPLIED | LIBXML_NONET | LIBXML_HTML_NODEFDTD);
And we can get all the links within the DOM with:
//Obtener todos los enlaces
$a_nodelist = $dom->getElementsByTagName('a');
To then go through each one, checking if they have an image:
//Recorrer cada uno
foreach ($a_nodelist as $enlace) {
//Obtener la primera imagen dentro del enlace
$img = $enlace->getElementsByTagName('img')->item(0);
if ($img) { //si tiene imagen
//Comparar el enlace con la imagen
$urlEnlace = $enlace->getAttribute('href');
$urlImagen = $img->getAttribute('src');
if ($urlEnlace == $urlImagen) {
//Si son el mismo, reemplazar
$enlace->parentNode->replaceChild($img, $enlace);
}
}
}
Where $enlace->parentNode->replaceChild($img, $enlace);
is the way we replace the link that has an image with the same URL, just for the image.
And, finally, we print the result:
//imprimir el resultado
echo $dom->saveHTML();
Result:
<body>
<img src="https://i.stack.imgur.com/mOJ0a.png">
<span>Imagen independiente precedida por un </span>
<a href="https://i.stack.imgur.com/mOJ0a.png">enlace</a>
<img src="https://i.stack.imgur.com/mOJ0a.png">
<a href="./">
<span>Enlace a una URL diferente que la imagen</span>
<img src="https://i.stack.imgur.com/mOJ0a.png">
</a>
</body>
Demo:
Watch the demo at 3v4l.org