how do I parse data from an html in real time?

Question

how do I parse data from an html in real time?

Navigation

0

I'm trying to read data from a local html file. A script causes one of the tags of this html to increase in value. My intention is to obtain this value. I am parsing the html file with the BeautifulSoup library but it always reads the initial html file without modifying it. I have been investigating a bit and they tell me that it is because the file is modified in memory and that is why I can not get the data I am looking for, that I should use AJAX to obtain that data, etc. I have tried other ways using the Requests library, etc but nothing works. Could someone throw me some light on the matter where I can continue to investigate? Thank you very much!

import urllib2
from bs4 import BeautifulSoup
import time
def getData():
  # specify the url
  web = 'file:///Users/apc/Desktop/Scoreboard/scoreboard.html'
  #query the website and return the html to the variable 'page'
  page = urllib2.urlopen(web)
  # parse the html using beautiful soup and store in variable 'soup'
  soup = BeautifulSoup(page,'html.parser')
  # get the index
  single_num = soup.find('span', {'id': 'single'})
  print single_num.text
while True:
  getData()
  time.sleep(5)

Right now it's more or less working! The python file launches the web that starts to work correctly but when parsearla I still get the info from the original html, that is, I still get the word 'Single' and not the updated data. I used selenium for chrome. I attach the code Any idea where the error is?

from bs4 import BeautifulSoup
import time
from selenium import webdriver
driver = webdriver.Chrome('/Users/apc/Library/Preferences/PyCharmCE2018.2/scratches/chromedriver')driver.get('file:///Users/apc/Library/Preferences/PyCharmCE2018.2/scratches/scoreboard.html')
html = driver.page_source
def getData(html):
  soup = BeautifulSoup(html, features='html.parser')
  single_num = soup.find('span', {'id': 'single'})
  print single_num
while True:
  getData(html)
  time.sleep(5)

Finally I have achieved it in the following way: I leave the code!

from selenium import webdriver
driver = webdriver.Chrome('C:/Users/Administrador.LINTERNA/.PyCharmCE2018.2/config/scratches/chromedriver')driver.get('C:/Users/Administrador.LINTERNA/.PyCharmCE2018.2/config/scratches/scoreboard.html')
def getData():
  out = driver.find_element_by_id('single').text
  print(out)
while True:
  getData()
  time.sleep(5)

Thanks for your help!

html5 python beautifulsoup

asked by Alejo 18.11.2018 в 19:19

source

0 answers

MYSQL and Java Connection Error Error QAbstractTableModel (parent: QObject = None): argument 1 has unexpected type 'list'