I need to take data from a page with scrapy and export it to mysql

0

Good morning.

I need to take data from the table on the link page and I must do it with scrapy, looking for information on the internet, I found that It would not be the best scrapy option, but since I was asked from work I have no choice.

I was trying a lot but I could not even make the spider take the data. I give it to take the data from xpath //div[@class='fila'] but when it finishes running the spider, it never returns anything, it leaves me an empty file, no matter what changes in the code. I can leave the spider empty and in the same way it does not change for the result. I have no scrapy experience, which surely must be a big part of the problem.

To run it I am using a virtual machine on Windows 10 to use Ubuntu and I give it the command scrapy crawl miproyecto .
Thank you very much.

    
asked by Gonzalo Ortiz 25.10.2018 в 20:30
source

1 answer

0

First we have to select the form to obtain the info, I use XPath, for the results that we require that the currency conversion table is:

divisas = response.xpath('//*[@id="acumulado"]/section[3]/section[1]/div/div/div')

To be able to store the results, you must configure the file items.py

import scrapy


class DivisasItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    nombre = scrapy.Field()
    ultimo = scrapy.Field()
    anterior = scrapy.Field()
    variacion = scrapy.Field()
    fechahora = scrapy.Field(serializer=str)
    pass

Then in the created spider file:

# -*- coding: utf-8 -*-
import scrapy
from divisas.items import *


class LanacionSpider(scrapy.Spider):
    name = 'lanacion'
    allowed_domains = ['lanacion.com.ar']
    start_urls = ['http://www.lanacion.com.ar/economia/divisas']

    def parse(self, response):
        ##print(response.xpath('//*[@id="acumulado"]/section[3]/section[1]/div/div/div').extract())
        divisas = response.xpath('//*[@id="acumulado"]/section[3]/section[1]/div/div/div')

        items = []

        for divisa in divisas:
            item = DivisasItem()
            print (divisa.xpath('label[1]').extract())
            item['nombre']  = divisa.xpath('label[1]/text()').extract()
            item['ultimo'] = divisa.xpath('label[2]/b/text()').extract()
            item['anterior'] = divisa.xpath('label[3]/text()').extract()
            item['variacion'] = divisa.xpath('label[4]/text()').extract()
            item['fechahora'] = divisa.xpath('label[5]/text()').extract()

            items.append(item)


        ##pass

        return items

I have uploaded all the files to my Git Spider by Jhovanny Uribe

We run the spider

scrapy crawl lanacion -o file.csv -t csv

We check that the file is exported correctly

cat file.csv

previous, name, variation, last, datedate

previous, description, variation , [u'date / hour ']

"37,900", Dolar Minorista, "-0.53%", "37,700", [u'25.10 - 16:56 ']

"9.936", Real x Pesos, "+0.0%", "9.937", [u'25.10 - 21:16 ']

"0.2622", Real x Dolar, "+0.0%", "0.2636", [u'25.10 - 21:16 ']

"42.184", Euro x Pesos, "-0.8%", "41.845", [u'25.10 - 21:16 ']

"1.1130", Euro x Dollar, "-0.8%", "1.1099", [u'25.10 - 21:16 ']

"0,729", Weights x Weight Mex., "0,0%", "0,729", [u'25.10 - 21:16 ']

"0.0192", Peso Mex. x Dollar, "0.0%", "0.0193", [u'25.10 - 21:16 ']

"0.054", Pesos x Peso Chile, "-0.2%", "0.054", [u'25.10 - 21:16 ']

"0.0014", Chilean Peso x Dollar, "-0.2%", "0.0014", [u'25.10 - 21:16 ']

"0.326", Weights x Yen, "+0.5%", "0.328", [u'25.10 - 21:16 ']

"0.0086", Yen x Dolar, "+0.5%", "0.0087", [u'25.10 - 21:16 ']

"47,723", Pounds x Weight, "-1.0%", "47,242", [u'25.10 - 21:16 ']

"1.2592", Pound x Dollar, "-1.0%", "1.2531", [u'25.10 - 21:16 ']

"2,861", PURU x Pesos, "0.0%", "2,861", [u'25.10 - 21:16 ']

"0.0755", PURU x Dolar, "0.0%", "0.0759", [u'25.10 - 21:16 ']

"37,900", Dolar Banco Nacion, "-0.5%", "37,700", [u'25.10 - 21:16 ']

Exportation to the MySql database can be done at the end, or during the process use Pipeline

    
answered by 26.10.2018 в 02:42