How to read a nested div without having a defined class

0

I'm with the following problem with a scrapper, I have the following code

    $text = get_content_of_element($html, 'div', 'class', 'text');

and the site where I need to read the data has the following structure

<div class="text">
            <div style="estilos declarados">
                <div style="imagen aca"></div>

            <div><h3 style="font:18px arial, sans-serif; font-weight:bold; padding:10px 0;">Overview</h3></div>
            <div>recien aca el texto que quiero leer</div><br>
            <div>texto que me gustaria eliminar</b></a></div>
        <div></div></div>

Can you think of a way to read the information I need? can you pass as a parameter that you read just the 5th div? and in that case also within the text can I erase the "div" 6º?

Thank you very much in advance

    
asked by Matt 07.01.2017 в 11:05
source

1 answer

0

PHP has classes to read HTML ... Whereas your HTML is correct and has no errors:

<div class="text">
  <div style="estilos declarados">
    <div style="imagen aca"></div>
    <div><h3 style="font:18px arial, sans-serif; font-weight:bold; padding:10px 0;">Overview</h3></div>
    <div>recien aca el texto que quiero leer</div><br>
    <div>texto que me gustaria eliminar</div>
    <div></div>
  </div>
</div>

We would execute the following:

 $DOM = new DOMDocument;
 $DOM->loadHTML($string_con_el_html);
 $divs = $DOM->getElementsByTagName('div');

 print_r($divs[4]->textContent);

Because counting the DIV that are nesting the text you're looking for is in position # 5 and since the array count from 0 your element is # 4.

    
answered by 08.01.2017 / 06:49
source