If, as @fredyfx says, the page gets its information with POST requests (executed from Javascript) that implies that without downloading the HTML, the elements that you are looking for with XPath expressions simply do not exist yet, because they will be created from Javascript when the requests to the answers are received.
This can be easily checked. Download the page:
$ wget https://www.airbnb.com/s/Londres--Reino-Unido
We see if it contains the string "/rooms"
, which is what your scraper uses to extract links:
$ grep /rooms Londres--Reino-Unido
Nothing comes out. The string does not appear on the page. If instead you run it in a browser and use the "inspect page" tool, you will see HTML elements such as:
<meta itemprop="url" content="www.airbnb.es/rooms/17247557?location=Londres%2C%20Reino%20Unido">
These elements were not in the HTML downloaded, but appear there as a result of certain javascript executed by the browser, following instructions contained in the downloaded document itself or in external scripts linked from that document.
Therefore, and in short, to scraping dynamically generated pages, it is necessary to execute the corresponding javascript, which requires a real browser because python can not execute javascript.
Although it is a great inconvenience, it is not impossible. Many modern browsers incorporate the ability to be "controlled" from a script. So python could launch a browser, send it load that page (the browser would run the javascript and generate dynamic content) and then python can retrieve the generated HTML, or even simulate a user's actions such as "click on such button", or "move" the mouse on top of such an image ".
A package for python that allows you to do these things is requests-html
. Unfortunately I do not see how to integrate it into scrapy. It is rather to download a single page and scraping on it, and not to make spiders that can continue downloading automatically found links.
Another, which uses a different principle, is scrapy-splash
. This allows to integrate splash
to be used from scrapy
. splash
is a software that acts as a server and acts as proxy between scrapy and the real server. The page is downloaded, the javascript is executed, and it is served to scrapy.
I have not used it and I can not tell you how it goes, but a priori it seems difficult to install, because the recommended method in the manual is to have splash
running in a docker container ...