Regular php expression between tags

2

I need to get this number 2769 inside these tags with a regular expression in php

<td style="background-color:#b0c400;border-bottom:1px solid #ffffff;text-align:right;padding:0px 5px 0px 5px;color:#000000;">2769</td>

I tried to do it like that but it did not work for me

preg_match_all("/0px 5px 0px 5px;color:#000000;\"\>'(\d+)'\</", $web, $apies);
    
asked by Marcos Trentacoste 26.10.2016 в 16:03
source

2 answers

3

In case the only way to get that number is by the styles of the td try this:

$texto = 'xxxxxx <td style="background-color:#b0c400;border-bottom:1px solid #ffffff;text-align:right;padding:0px 5px 0px 5px;color:#000000;">2769</td> xxxxxx';

$expr_reg = '%<td style="background-color:#b0c400;border-bottom:1px solid #ffffff;text-align:right;padding:0px 5px 0px 5px;color:#000000;">(\d*?)</td>%si';
if (preg_match($expr_reg, $texto, $coincidencias)) {
    # Successful match
    print_r($coincidencias);
} else {
    # Match attempt failed
    echo "Mala suerte";
}

In $ matches [1] you have your number

If you want to try it before: link

    
answered by 26.10.2016 / 22:33
source
2

You should not use regular expressions to process HTML. Just a small change in the HTML would make your regex fail. A space of more, a change in the attributes of the tag, a comment, or more complex structures, would make even a gigantic regex not follow the rules.

It's very easy to process HTML with DOM , they are the tools that They are designed for that.


The DOM is simply generated as follows:

$html = '<td style="background-color:#b0c400;border-bottom:1px solid #ffffff;text-align:right;padding:0px 5px 0px 5px;color:#000000;">2769</td>';

//Generar el DOM
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_COMPACT | LIBXML_HTML_NOIMPLIED | LIBXML_NONET);

And we can get all the text without tags:

//Eliminar todos los tags
$sin_tags = $dom->textContent;

echo "Contenido de texto de toda la página:\n" . $sin_tags;    // => 2769

Or, if the HTML has more tags and you're only interested in the first <td> :

//Obtener todos los TD
$td_nodelist = $dom->getElementsByTagName('td');
//Obtener el primer TD
$td = $td_nodelist->item(0);

//Obtener el contenido de texto del tag
$numero = $td->textContent;

echo "\n\nContenido del TD:\n" . $numero;    // => 2769


Result:

Contenido de texto de toda la página:
2769

Contenido del TD:
2769


Demo:

See the ideone.com demo

    
answered by 30.10.2016 в 12:26