Problem with regexp when using. * "

2

I am using the re module to extract data from a web. Specifically: link

But I have a problem, if I do:

re.findall('<a href="(.*)"', data)

It returns me from the beginning of the href to the end of data. If I do:

re.findall('<a href="(.*)" class', a)

Then it does return the correct value to me. I want to select everything from the first quote of the href to the second, but if I put a quote after the. * It seems to ignore it. In contrast, normal text does work.

Specifically I want to get the name of the results, and the magnet link. I try like this:

re.findall("""<div class="detName"><a href=".*" .*">(.*)</a>.*href="(magnet.*)" title="Download thi""", html.replace('\n', ''))

And so (to see only the names, it was just a test):

re.findall("""<div class="detName"><a href=".*" .*">(.*)</a>""", data.replace('\n', ''))

But he returns me:

['<img src="//thepiratebay.org/static/img/rss_small.gif" alt="RSS" />']

How should I do it?

    
asked by Pablo 08.07.2017 в 08:31
source

1 answer

1

What you can do is, instead of looking for any character after href="

answered by 08.07.2017 / 09:55
source