From the documentation of MechanicalSoup (translated, and the bold is mine):
MechanicalSoup automatically stores and sends cookies , follows redirects and can follow links and submit forms. Does not run JavaScript .
However, recently it has shown an alternative: requests-html , which calls itself "html for humans". Specifically designed for Scraping tasks it is even capable of executing javascript and giving you access to the resulting DOM.
How does it manage to execute javascript from python if it is another language? Well, "cheating." You can not really run modern javascript if it is not in a real browser, because emulating the environment of a browser, with all its features, would be more complex than launching one. What requests-html
does is use the pyppeteer library to launch Chromium with the option --headless
and "handle it by remote control".
Chromium is an implementation of open code from Chrome. The --headless
option makes it run without opening any window, but otherwise fully functional (the "exit" of the page in question is rendered in an invisible "virtual" window, but you could even get a screenshot of the result ).
This type of techniques is usually used for testing of web applications (to automate user actions and see that the result is what you want), but requests-html
uses it more for get the resulting DOM that you can then analyze to extract links and other scraping tasks.