Extract href from a table using PHP

0

I want to extract from a table in a Web page the content of this one plus the reference that each of them has, I already extract the table but I am missing how to get to the href tag, can you help me?

This is the table:

The code used is as follows:

$tabla = $html->getElementsByTagName("table")->item(1);

foreach($tabla->getElementsByTagName('tr') as $tr) {

$tds = $tr->getElementsByTagName('td'); // get the columns in this row

$href =  $tr->getElementsByTagName('a'); //->item(0)->getAttribute('href');
if (trim($tds->item(0)->nodeValue) <> '') {

    echo $tds->item(0)->nodeValue." , ".
         $tds->item(1)->nodeValue." , ".
         $tds->item(1)->nodeValue." , ".
         $tds->item(3)->nodeValue." , ".
         $tds->item(4)->nodeValue." , ".
         $tds->item(5)->nodeValue." , ".
         $tds->item(6)->nodeValue." , ".
         $tds->item(7)->nodeValue." , ".
         $tds->item(8)->nodeValue." , ";
        //$href;

    echo "<br />";   
}
}

Thanks in advance

Ulysses

Thanks to all of you for your answers, I have implemented them but it throws me error, for better clarity I have all the code, if they execute it, the error that I show below appears:

<?php
    error_reporting(0);
    $html = new DomDocument;
    $source = file_get_contents("http://seia.sea.gob.cl/busqueda/buscarProyectoAction.php?nombre=central&_paginador_refresh=0&_paginador_fila_actual=2");

    $html->loadHTML($source);

    // Cada TR

    $tabla = $html->getElementsByTagName("table")->item(1);

    foreach($tabla->getElementsByTagName('tr') as $tr)
    {
        $tds = $tr->getElementsByTagName('td'); // get the columns in this row

        //$href = $tds->getAttribute('href');

        $href = $tds->getElementsByTagName('a')->item(0)->getAttribute('href'‌​);

        if (trim($tds->item(0)->nodeValue) <> '') {

            echo $tds->item(0)->nodeValue." , ".
                 $tds->item(1)->nodeValue." , ".
                 $tds->item(1)->nodeValue." , ".
                 $tds->item(3)->nodeValue." , ".
                 $tds->item(4)->nodeValue." , ".
                 $tds->item(5)->nodeValue." , ".
                 $tds->item(6)->nodeValue." , ".
                 $tds->item(7)->nodeValue." , ".
                 $tds->item(8)->nodeValue." , ";
                 //$href;

            echo "<br />";   
        } 

        //break; // don't check any further rows

    }
?>

    
asked by urivera_cl 05.01.2017 в 15:57
source

2 answers

0

I did this, and if it works

$html = <<<HTML
<html>
    <head>

    </head>
    <body>
        <table>
            <tr>
                <td><a href="http://localhost"/></td>
                <td></td>
                <td></td>
                <td></td>
            </tr>
        </table>
    </body>
</html>
HTML;

$var = new DOMDocument();
$var->loadHTML($html);
echo '<pre>';
print_r($var->getElementsByTagName('a')->item(0)->getAttribute('href'));
echo '</pre>';

and the result is

http://localhost

The reason why you mark error is because when you get your items <a>

$a = $tr->getElementsByTagName('a');

you must verify that at least there is one on your data

if($a->length > 0){
    $href = $a->item(0)->getAttribute('href');
}

or you can perform an iteration, and remove the break at the end of your cycle, since that table gets the first tr as the head and there is no <a> , unable to get item(0) can not execute getAttribute on null , that is why the error

Excellent ... !!!! It worked perfectly .. Thank you ...

    
answered by 05.01.2017 в 16:24
0
$href = array();
foreach ($tds as $td) {
    $href [] = $td->getAttribute('href ');
}

or

 $href = $tds[0]->getAttribute('href ');

The first example would be if you need to extract the href from more than one td, in that example you go through all the td and save the href in an array in this way:

$ref[0]=href1; 
$ref[1]=href2; 
$ref[2]=href3; 
$ref[3]=href4; 

and the second you would directly access the href attribute of the first td.

Thanks for your reply,

I implemented your answer but it throws me the following error:

Fatal error: Call to undefined method DOMNodeList :: getAttribute ()

    
answered by 05.01.2017 в 16:23