Scrapy Python: Do a recursive search on a page

1

I'm trying to do a recursive search on a web page with Scrapy. I have modified the value in the file of settings.py of DEPTH_LIMIT = 4 and my code is as follows:

class HreflocalizeSpider(scrapy.Spider):
   name = "hrefLocalize"
   allowed_domains = [URL]
   start_urls = (
    'URL_DE_BUSQUEDA',
   )
   rules = (    Rule(LinkExtractor(allow=()),callback='parse', follow=True))
   settings.overrides['DEPTH_LIMIT']= 4 #Puse esto para forzar el cambio
   settings.overrides['DEPTH_PRIORITY']= 4

   def parse(self, response):
      hxs = scrapy.Selector(response)
      lines = hxs.xpath("//@href").extract()
      linkPattern = re.compile("^(?:ftp|http|https):\/\/(?:[\w\.\-\+]+:{0,1}[\w\.\-\+]*@)?(?:[a-z0-9\-\.]+)(?::[0-9]+)?(?:\/|\/(?:[\w#!:\.\?\+=&%@!\-\/\(\)]+)|\?(?:[\w#!:\.\?\+=&%@!\-\/\(\)]+))?$")
      for line in lines:
      print line
      if linkPattern.match(line):
              yield Request(line,self.parse)

But still with all this, the program always tells me that

'request_depth_max': 1

I have seen that it matters the middleware to do the search in depth, but still, it does not do that search.

Could someone help me out and tell me what I'm doing wrong?

Thank you very much in advance!

    
asked by Jose Vila 30.04.2016 в 21:04
source

0 answers