Problem with regexp when using. * "


I am using the re module to extract data from a web. Specifically: link

But I have a problem, if I do:

re.findall('<a href="(.*)"', data)

It returns me from the beginning of the href to the end of data. If I do:

re.findall('<a href="(.*)" class', a)

Then it does return the correct value to me. I want to select everything from the first quote of the href to the second, but if I put a quote after the. * It seems to ignore it. In contrast, normal text does work.

Specifically I want to get the name of the results, and the magnet link. I try like this:

re.findall("""<div class="detName"><a href=".*" .*">(.*)</a>.*href="(magnet.*)" title="Download thi""", html.replace('\n', ''))

And so (to see only the names, it was just a test):

re.findall("""<div class="detName"><a href=".*" .*">(.*)</a>""", data.replace('\n', ''))

But he returns me:

['<img src="//" alt="RSS" />']

How should I do it?

asked by Pablo 08.07.2017 в 08:31

1 answer


What you can do is, instead of looking for any character after href="

answered by 08.07.2017 / 09:55