Filters Data URL CSV Python

2

I'm working with a report I get from a URL. I'm looking to make filters which return only the data from the Fillrate column less than 0.05 along with the name, request, impressions data and once obtained only those values saved in a CSV. I do not know how to do that kind of operations. Python 2.7

import urllib, urllib2, cookieliz

#Usuario y contraseña
username = 'email'
password = 'contraseña'


#Cookies
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
#Equivale a un POST
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('https://ppp.com/login', login_data)

#Reportes
resp = opener.open('https://ppp.com/reports/csv/11777')
content = resp.read()
print content 

here the code ends. What I get is the following:

"Inventory Name","Requests","Impressions","Fill Rate"
"aaass MWasdS","569737093","244066","0.04"
"bssss","331270265","381168","0.12"
"cumbia","152492369","190008","0.12"
"cuadrupedia","133983625","53184","0.04"
    
asked by Martin Bouhier 02.10.2017 в 20:51
source

2 answers

1

pandas.read_csv can directly read from a http \ https address a csv. The problem is that you need to authenticate yourself and the use of cookies, in this case you can only do something like what you do and load the DataFrame once you have obtained the data with urllib , request or any other specialized library.

You can use StringIO to read and write string buffers, being able to treat the string content as if it were a file and use pandas.read_csv as you would normally:

import urllib
import urllib2 
from StringIO import StringIO
import cookieliz

#Usuario y contraseña
username = 'email'
password = 'contraseña'


#Cookies
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
#Equivale a un POST
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('https://ppp.com/login', login_data)

#Reportes
resp = opener.open('https://ppp.com/reports/csv/11777')
content = resp.read()
df = pd.read_csv(StringIO(content))

You can also save the csv to your hard drive in a file and load it normally with pandas.

with open("datos.csv", "wb") as f:
    f.write(content)

df = pd.read_csv("datos.csv")

Once you have the DataFrame loaded in either of the two ways, the rest is simple:

df = df[df['Fill Rate'] < 0.05]
df.to_csv("salida.csv", sep=',')
    
answered by 02.10.2017 / 21:41
source
0

Martín, regardless of the response of @FJSevilla that is appropriate in your case since you use pandas, only informatively, a solution using pure "python" goes through the use of list comprehension:

import urllib, urllib2, cookieliz

#Usuario y contraseña
username = 'email'
password = 'contraseña'

#Cookies
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
#Equivale a un POST
login_data = urllib.urlencode({'username' : username, 'password' : password})
opener.open('https://ppp.com/login', login_data)

#Reportes
resp = opener.open('https://ppp.com/reports/csv/11777')
content = resp.read()

# Creamos una lista separando por \n y luego por , y quitamos las comillas de cada campo
datos = [[w.replace('"', '') for w in e] for e in [l.split(",") for l in content.split("\n")] if len(e) == 4]

# Filtramos los datos quitando los encabezados 
datos_filtrados =  [c for c in [fila for fila in datos[1:]] if float(c[3]) <= 0.05]

print(datos_filtrados)
    
answered by 02.10.2017 в 22:26