While it would be easy to remove HTML comments with a regular expression like:
$cadenaHTML = preg_replace( '/<!--(?!\[)[^-]*(?:-(?!->)[^-]*)*-->/', '', $cadenaHTML);
-
If you are wondering, [^-]*(?:-(?!->)[^-]*)*
is much more efficient than using .*?
. See Unrolling the loop .
There are some scenarios where it would fail. For example with <--
within the value of an input.
It is very easy to process HTML with DOM , they are the tools that are designed for that.
The DOM is simply generated as follows:
//Englobamos en body porque tu ejemplo no tiene
//para corregirlo y que lo procese bien
$html = '<body>' . $cadenaHTML . '</body>';
//Generar el DOM
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_COMPACT | LIBXML_HTML_NOIMPLIED | LIBXML_NONET | LIBXML_HTML_NODEFDTD);
And we look for comments with XPATH
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//comment()') as $comment) {
// ...
}
To be eliminated one by one:
$comment->parentNode->removeChild($comment);
And we save the result as string:
$resultado = $dom->saveHTML();
Code:
<?php
//Limpiar comentarios en una cadena HTML con PHP
// https://es.stackoverflow.com/q/114030/127
$cadenaHTML = "codigo <b>util</b> <!-- codigo basura --> mas codigo util <!-- mas codigo basura -->";
//Englobamos en body porque tu ejemplo no tiene
//para corregirlo y que lo procese bien
$html = '<body>' . $cadenaHTML . '</body>';
//Generar el DOM
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_COMPACT | LIBXML_HTML_NOIMPLIED | LIBXML_NONET | LIBXML_HTML_NODEFDTD);
//Buscar cada comentario
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//comment()') as $comment) {
//eliminarlo
$comment->parentNode->removeChild($comment);
}
//Guardar el html
$resultado = $dom->saveHTML();
//Imprimir el resultado (escapando caracteres especiales! para verlo como texto)
echo htmlentities($resultado);
Result:
<body>codigo <b>util</b> mas codigo util </body>
Demo: link