Error 'NoneType' object is not callable when trying to select item by id

1

I'm trying to get to an element of the DOM that has a certain id . The problem is that I am using the .getElementsById() method of AdvancedHTMLParser , which is available in your guide .

Once I extract the HTML from the URL , I search for the DOM element in this way:

page_soup = soup(page_html, "html.parser")

bundle = page_soup.getElementById("pdpbundleparts")

But it returns an error:

  

TypeError: 'NoneType' object is not callable

The web is downloaded well.

    
asked by JetLagFox 27.12.2017 в 19:47
source

1 answer

1

If you use BeautifulSoup to parse the html, you are using html.parser as parser, given the arguments you pass to the initializer ( page_soup = soup(page_html, "html.parser") ). The problem is that you are using a method ( getElementById ) of another parser ( AdvancedHTMLParser ) and not the proper methods of BS to select elements ( find , find_all , ...).

That I know AdvancedHTMLParser can not be used as an internal parser in BS (they are available lxml , html5lib and html.parser from the standard library), but you can use html.parser No problems in your case:

import requests
from bs4 import BeautifulSoup

url = "https://www.microsoft.com/en-us/store/p/batman-the-enemy-within-the-complete-season-episodes-1-5/c0dgrh3hk0vk"
header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9'}
request = requests.get(url,  headers = header)
page_html = request.text

page_soup = BeautifulSoup(page_html, "html.parser")
bundle = page_soup.find(id="pdpbundleparts") 

If you wish, instead of BeautifulSoup you can use AdvancedHTMLParser directly without problems:

import requests
import AdvancedHTMLParser

url = "https://www.microsoft.com/en-us/store/p/batman-the-enemy-within-the-complete-season-episodes-1-5/c0dgrh3hk0vk"
header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9'}
request = requests.get(url,  headers = header)
page_html = request.text

parser = AdvancedHTMLParser.AdvancedHTMLParser()
parser.parseStr(page_html)
bundle = parser.getElementById("pdpbundleparts")
  

Note: In both cases I use Requests to get the html so that the example is reproducible. It is necessary to provide a false User-Agent so that the server does not send us for a ride ...

    
answered by 27.12.2017 / 21:14
source