Problem with regexp when using. * "

Question

Problem with regexp when using. * "

Navigation

#1 by (1 votes)

2

I am using the re module to extract data from a web. Specifically: link

But I have a problem, if I do:

re.findall('<a href="(.*)"', data)

It returns me from the beginning of the href to the end of data. If I do:

re.findall('<a href="(.*)" class', a)

Then it does return the correct value to me. I want to select everything from the first quote of the href to the second, but if I put a quote after the. * It seems to ignore it. In contrast, normal text does work.

Specifically I want to get the name of the results, and the magnet link. I try like this:

re.findall("""<div class="detName"><a href=".*" .*">(.*)</a>.*href="(magnet.*)" title="Download thi""", html.replace('\n', ''))

And so (to see only the names, it was just a test):

re.findall("""<div class="detName"><a href=".*" .*">(.*)</a>""", data.replace('\n', ''))

But he returns me:

['<img src="//thepiratebay.org/static/img/rss_small.gif" alt="RSS" />']

How should I do it?

html python regex

asked by Pablo 08.07.2017 в 06:31

source

1 answer



                    
        

         
                            Transform Date to String and print it to a JDateChooser
                                        Thousand separator in php or mysql

score 1 · Accepted Answer

What you can do is, instead of looking for any character after href="


                                    
                                                                            answered by                                          08.07.2017 / 07:55
                                    
                                    source