I'm trying to read data from a local html file. A script causes one of the tags of this html to increase in value. My intention is to obtain this value. I am parsing the html file with the BeautifulSoup library but it always reads the initial html file without modifying it. I have been investigating a bit and they tell me that it is because the file is modified in memory and that is why I can not get the data I am looking for, that I should use AJAX to obtain that data, etc. I have tried other ways using the Requests library, etc but nothing works. Could someone throw me some light on the matter where I can continue to investigate? Thank you very much!
import urllib2
from bs4 import BeautifulSoup
import time
def getData():
# specify the url
web = 'file:///Users/apc/Desktop/Scoreboard/scoreboard.html'
#query the website and return the html to the variable 'page'
page = urllib2.urlopen(web)
# parse the html using beautiful soup and store in variable 'soup'
soup = BeautifulSoup(page,'html.parser')
# get the index
single_num = soup.find('span', {'id': 'single'})
print single_num.text
while True:
getData()
time.sleep(5)
Right now it's more or less working! The python file launches the web that starts to work correctly but when parsearla I still get the info from the original html, that is, I still get the word 'Single' and not the updated data. I used selenium for chrome. I attach the code Any idea where the error is?
from bs4 import BeautifulSoup
import time
from selenium import webdriver
driver = webdriver.Chrome('/Users/apc/Library/Preferences/PyCharmCE2018.2/scratches/chromedriver')driver.get('file:///Users/apc/Library/Preferences/PyCharmCE2018.2/scratches/scoreboard.html')
html = driver.page_source
def getData(html):
soup = BeautifulSoup(html, features='html.parser')
single_num = soup.find('span', {'id': 'single'})
print single_num
while True:
getData(html)
time.sleep(5)
Finally I have achieved it in the following way: I leave the code!
from selenium import webdriver
driver = webdriver.Chrome('C:/Users/Administrador.LINTERNA/.PyCharmCE2018.2/config/scratches/chromedriver')driver.get('C:/Users/Administrador.LINTERNA/.PyCharmCE2018.2/config/scratches/scoreboard.html')
def getData():
out = driver.find_element_by_id('single').text
print(out)
while True:
getData()
time.sleep(5)
Thanks for your help!