Maintain the session when downloading the capcha

2

I have this code that I downloaded the captcha but I have to send the data along with the resolution of the captcha. My question is: How do I keep the captcha download session and send the form request along with the resolution of the captcha (I'll take care of that)?

from bs4 import BeautifulSoup
import urllib

url = "https://demos.devexpress.com/aspxeditorsdemos/ASPxCaptcha/Features.aspx"
content = urllib.request.urlopen(url)
soup = BeautifulSoup(content)
img = soup.find('img',id ='ContentHolder_Captcha_IMG')
print (img)
request = urllib.request.urlretrieve(urllib.request.urljoin(url, img['src']), 'captcha.jpg')
print (request)
    
asked by lARAPRO 10.08.2018 в 18:48
source

1 answer

1

The key is to keep and reuse session cookies between requests. The session concept implies maintaining certain parameters through multiple requests, which includes maintaining the generated cookies in all the requests made in it.

Using urllib of the Python 3 stdlib, we have the module http.cookiejar , which together with urllib.request.OpenerDirector allows us to manage cookies. Keep in mind that it will always be more cumbersome than if you use tools with more abstraction as requests (see Advanced use - Session objects ) or scrapy .

I leave a small example that allows you to authenticate in StackOverflow, keeping the session between requests, at the end a request is made to the site in Spanish and the html is stored in a file that you can open with the browser to quickly see the result.

from bs4 import BeautifulSoup
import http.cookiejar
import urllib


EMAIL = "email"
PASSWORD = "password"

BASE_URL = 'https://stackoverflow.com/'
LOGIN_URL = 'https://stackoverflow.com/users/login'
ES_BASE_URL = "https://es.stackoverflow.com/"

USER_AGENT = 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0'
HEADERS = {'User-Agent': USER_AGENT}


cj = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
urllib.request.install_opener(opener)


values = {
          'email' : EMAIL,
          'password' : PASSWORD,
         }

data = urllib.parse.urlencode(values).encode("utf-8")
req = urllib.request.Request(LOGIN_URL, data, HEADERS)
with urllib.request.urlopen(req) as response:
   html = response.read()

# Posteriores peticiones mantendran la sesión
req = urllib.request.Request(ES_BASE_URL)
with urllib.request.urlopen(req) as response:
   html = response.read()
   with open("so_es.html", "wb") as f:
       f.write(html)

If you comment on the line urllib.request.install_opener(opener) you can see how the session is not maintained between requests. The example is very basic, for example, if necessary you can store the cookies on disk and be loaded later to be reused.

    
answered by 10.08.2018 / 22:06
source