If you use BeautifulSoup to parse the html, you are using html.parser
as parser, given the arguments you pass to the initializer ( page_soup = soup(page_html, "html.parser")
). The problem is that you are using a method ( getElementById
) of another parser ( AdvancedHTMLParser ) and not the proper methods of BS to select elements ( find
, find_all
, ...).
That I know AdvancedHTMLParser
can not be used as an internal parser in BS (they are available lxml
, html5lib
and html.parser
from the standard library), but you can use html.parser
No problems in your case:
import requests
from bs4 import BeautifulSoup
url = "https://www.microsoft.com/en-us/store/p/batman-the-enemy-within-the-complete-season-episodes-1-5/c0dgrh3hk0vk"
header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9'}
request = requests.get(url, headers = header)
page_html = request.text
page_soup = BeautifulSoup(page_html, "html.parser")
bundle = page_soup.find(id="pdpbundleparts")
If you wish, instead of BeautifulSoup you can use AdvancedHTMLParser directly without problems:
import requests
import AdvancedHTMLParser
url = "https://www.microsoft.com/en-us/store/p/batman-the-enemy-within-the-complete-season-episodes-1-5/c0dgrh3hk0vk"
header = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9'}
request = requests.get(url, headers = header)
page_html = request.text
parser = AdvancedHTMLParser.AdvancedHTMLParser()
parser.parseStr(page_html)
bundle = parser.getElementById("pdpbundleparts")
Note: In both cases I use Requests to get the html so that the example is reproducible. It is necessary to provide a false User-Agent so that the server does not send us for a ride ...