Capture various data with php file_get_content and preg_match

0

I have this code:

// Example by deerme.org
$data = file_get_contents("http://www.powerball.com/pb_home.asp");

if ( preg_match('|<font size="5" color="#000000"><strong>(.*)</strong></font>| ' , $data , $cap ) )
{
    echo $cap[1];
}

It works well, what it does is find a chain on the page and print it on the screen, but only search once and I need you to search the entire page and print all the matches you have, if someone is familiar with the preg_match or if you know any other way of acerlo I would appreciate it. Thanks !!!

    
asked by Jose Rodriguez 11.03.2017 в 05:25
source

2 answers

0

Instead of preg_match() use preg_match_all() . The third parameter (in the following case is $cap ) saves the occurrences that it finds:

$expr = '|<font size="5" color="#000000"><strong>(.*)</strong></font>|';
preg_match_all($expre, $data, $cap);

This is the preg_match_all () manual page.

    
answered by 11.03.2017 в 05:41
0

You can not use preg_match to parse HTML, XML, XHTML or anything like that. For the same reason it is a useless tool for web scraping.

In the best case you are going to matchear an occurrence but, what happens when there are two coincidences? How do you know if the end of the second match is not the end of the first?

It happens that HTML has a free grammar and regex expects a regular grammar. They are different magnitudes of complexity.

Long answer : See the best answer about it that exists and is legendary in Stack Overflow.

  

I think the flaw is that HTML is a Chomsky Type 2 grammar   (context free grammar) and RegEx is a Chomsky Type 3 grammar (regular   grammar) Since a Type 2 grammar is fundamentally more complex than a   Type 3 grammar (see the Chomsky hierarchy), you can not possibly make   this work. But many will try, some will claim success and others will   find the fault and totally mess you up.

Short answer : use an XML parser such as SimpleXML , or a scraping tool like Goutte or PHP Html Parser .

    
answered by 11.03.2017 в 16:24