Clear comments in an HTML string with PHP

1

I have an html string and I need to delete the commented code in that string, that is, if I find:

"código útil <!-- codigo basura --> mas código útil <!-- mas código basura -->

Remove everything that is garbage

for($cadenaHTML){ 
   if(<!--){
      borrar hasta (-->)
   } 
}

basically it's the idea or something similar

    
asked by Pedro 01.11.2017 в 20:45
source

1 answer

0

While it would be easy to remove HTML comments with a regular expression like:

$cadenaHTML = preg_replace( '/<!--(?!\[)[^-]*(?:-(?!->)[^-]*)*-->/', '', $cadenaHTML);
  •   

    If you are wondering, [^-]*(?:-(?!->)[^-]*)* is much more efficient than using .*? . See Unrolling the loop .

There are some scenarios where it would fail. For example with <-- within the value of an input.


It is very easy to process HTML with DOM , they are the tools that are designed for that.


The DOM is simply generated as follows:

//Englobamos en body porque tu ejemplo no tiene
//para corregirlo y que lo procese bien
$html = '<body>' . $cadenaHTML . '</body>';

//Generar el DOM
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_COMPACT | LIBXML_HTML_NOIMPLIED | LIBXML_NONET | LIBXML_HTML_NODEFDTD);

And we look for comments with XPATH

$xpath = new DOMXPath($dom);
foreach ($xpath->query('//comment()') as $comment) {
    // ...
}

To be eliminated one by one:

$comment->parentNode->removeChild($comment);

And we save the result as string:

$resultado = $dom->saveHTML();


Code:

<?php
//Limpiar comentarios en una cadena HTML con PHP
// https://es.stackoverflow.com/q/114030/127

$cadenaHTML = "codigo <b>util</b> <!-- codigo basura --> mas codigo util <!-- mas codigo basura -->";

//Englobamos en body porque tu ejemplo no tiene
//para corregirlo y que lo procese bien
$html = '<body>' . $cadenaHTML . '</body>';

//Generar el DOM
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_COMPACT | LIBXML_HTML_NOIMPLIED | LIBXML_NONET | LIBXML_HTML_NODEFDTD);

//Buscar cada comentario
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//comment()') as $comment) {
    //eliminarlo
    $comment->parentNode->removeChild($comment);
}

//Guardar el html
$resultado = $dom->saveHTML();

//Imprimir el resultado (escapando caracteres especiales! para verlo como texto)
echo htmlentities($resultado);

Result:

<body>codigo <b>util</b>  mas codigo util </body>

Demo: link

    
answered by 01.11.2017 / 21:12
source