execute javascript in python

2

When accessing a web via python, is there a way to show the response code of a javascript? what the user sees I have this:

import mechanicalsoup

browser = mechanicalsoup.StatefulBrowser()

print(browser.open("https://www.web.com/"))

print(browser.get_current_page())

print(browser.get_url())

for example the answer of this .py is to see the source code of the page but the answer code of javascript is not seen. I'm looking for something like when I inspect the code, there is the visual code displayed

    
asked by Yoandy Isse Oña 31.01.2018 в 23:07
source

2 answers

0

From the documentation of MechanicalSoup (translated, and the bold is mine):

  

MechanicalSoup automatically stores and sends cookies , follows redirects and can follow links and submit forms. Does not run JavaScript .

However, recently it has shown an alternative: requests-html , which calls itself "html for humans". Specifically designed for Scraping tasks it is even capable of executing javascript and giving you access to the resulting DOM.

How does it manage to execute javascript from python if it is another language? Well, "cheating." You can not really run modern javascript if it is not in a real browser, because emulating the environment of a browser, with all its features, would be more complex than launching one. What requests-html does is use the pyppeteer library to launch Chromium with the option --headless and "handle it by remote control".

Chromium is an implementation of open code from Chrome. The --headless option makes it run without opening any window, but otherwise fully functional (the "exit" of the page in question is rendered in an invisible "virtual" window, but you could even get a screenshot of the result ).

This type of techniques is usually used for testing of web applications (to automate user actions and see that the result is what you want), but requests-html uses it more for get the resulting DOM that you can then analyze to extract links and other scraping tasks.

    
answered by 09.03.2018 / 12:39
source
1

I think you need a headless browser .

For this I advise you to use Selenium Web Driver and PhantomJS . In this way you can make Web scraping of any Web (even with javascript). Also inject javascript code, browse, capture images, etc.

To use it with python (in SO English):

link

Greetings.

    
answered by 09.03.2018 в 12:54