Good morning, everyone. I want to know what I would need to create a website that stores the current and updated prices of different known web pages. I'm doing a project and I need help.
Good morning, everyone. I want to know what I would need to create a website that stores the current and updated prices of different known web pages. I'm doing a project and I need help.
Good morning. To collect information from other pages (web scraping), I recommend using CURL. You can even skip CAPTCHAS (I do it with google). Of course then you must process the information you take and store it. Here I leave you a small example.
$ch1 = curl_init('string_url');
$options1 = array(CURLOPT_POST => true,
CURLOPT_HEADER => $request_headers,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_POSTFIELDS => (Si vas a enviar parametros)
);
curl_setopt_array($ch1, $options1);
$page = curl_exec($ch1);
if(curl_exec($ch1) === false)
{
\print_r(curl_error($ch1));
//Posibles Errores
}
curl_close($ch1);
$dom = new \DOMDocument();
$dom->validateOnParse = true;
$dom->preserveWhiteSpace = false;
$dom->substituteEntities = false;
//Load Document Object Model
$dom->loadHTML($page);
$domxPath = new \DOMXPath($dom);
$urls = $domxPath->query("/html/body//div[@class='precios']");
//Esta linea final es donde debes buscar el contenedor que tiene la
//información que te interesa.
You basically need knowledge of:
Basically you have to create an html structure with its css design and through javascript and ajax calls, you make requests to a php that will return the data to be shown, for example, by json from a mysql database.
Logically you have to fill the database with data collected by means of bots that you will launch from a cron and you will probably have to program them with php and bash mixes and maybe other utilities.
As you see is complex and you need to master many techniques so I recommend you look for ready-made projects that resemble what you want and adapt.
For example, you can install a Drupal and do everything through the existing plugins (search by scraping for data collection, you have to do that part anyway).
I, instead of running a whole software you, I recommend that you waste time looking at some tools that already do that. They are called crawlers.
There are many. If you know php, this can work for you: link
But if you search crawler in google you will see the huge amount of tools that there are.