I am using the re module to extract data from a web. Specifically: link
But I have a problem, if I do:
re.findall('<a href="(.*)"', data)
It returns me from the beginning of the href to the end of data. If I do:
re.findall('<a href="(.*)" class', a)
Then it does return the correct value to me. I want to select everything from the first quote of the href to the second, but if I put a quote after the. * It seems to ignore it. In contrast, normal text does work.
Specifically I want to get the name of the results, and the magnet link. I try like this:
re.findall("""<div class="detName"><a href=".*" .*">(.*)</a>.*href="(magnet.*)" title="Download thi""", html.replace('\n', ''))
And so (to see only the names, it was just a test):
re.findall("""<div class="detName"><a href=".*" .*">(.*)</a>""", data.replace('\n', ''))
But he returns me:
['<img src="//thepiratebay.org/static/img/rss_small.gif" alt="RSS" />']
How should I do it?