Search for all tags that begin with a given chain

2

I'm doing a scrapy for a page where I'm looking for a label. I do it with BeautifulSoup. I look for all the links that start with them, but end up differently.

Example:

for url in soup.find_all('a', href=("/es/nds/*******")):

Where the asterisks represent that I look for an end of any string. I do not know how to look for it. I've tried:

 href=("/es/nds/"+""), href=("/es/nds/\*") y similares.

Can you help me out with it?

Thanks. Greetings.

    
asked by Shorosky 13.01.2018 в 12:58
source

1 answer

1

You can use regular expressions:

import re
from bs4 import BeautifulSoup

html = """
<a href="/es/nds/\*">foo</a> 
<a href="/es/nds/aaa/bbb">foo</a> 
<a href="/es/ccc">foo</a> 
<a href="/gggg/ffff">foo</a> 
<a href="/es/nds/_hhhh">foo</a> 
"""
soup = BeautifulSoup(html, "lxml")
patt = re.compile('^/es/nds/')
for url in soup.find_all('a', href=patt):
    print(url)

The output is:

<a href="/es/nds/\*">foo</a>
<a href="/es/nds/aaa/bbb">foo</a>
<a href="/es/nds/_hhhh">foo</a>

In this case simply use the special character ^ (Caret) which indicates that the pattern matches the beginning of the string.

    
answered by 13.01.2018 в 17:20