Search for all tags that begin with a given chain


I'm doing a scrapy for a page where I'm looking for a label. I do it with BeautifulSoup. I look for all the links that start with them, but end up differently.


for url in soup.find_all('a', href=("/es/nds/*******")):

Where the asterisks represent that I look for an end of any string. I do not know how to look for it. I've tried:

 href=("/es/nds/"+""), href=("/es/nds/\*") y similares.

Can you help me out with it?

Thanks. Greetings.

asked by Shorosky 13.01.2018 в 12:58

1 answer


You can use regular expressions:

import re
from bs4 import BeautifulSoup

html = """
<a href="/es/nds/\*">foo</a> 
<a href="/es/nds/aaa/bbb">foo</a> 
<a href="/es/ccc">foo</a> 
<a href="/gggg/ffff">foo</a> 
<a href="/es/nds/_hhhh">foo</a> 
soup = BeautifulSoup(html, "lxml")
patt = re.compile('^/es/nds/')
for url in soup.find_all('a', href=patt):

The output is:

<a href="/es/nds/\*">foo</a>
<a href="/es/nds/aaa/bbb">foo</a>
<a href="/es/nds/_hhhh">foo</a>

In this case simply use the special character ^ (Caret) which indicates that the pattern matches the beginning of the string.

answered by 13.01.2018 в 17:20