I want to set an array from a postgresql database so that we have the eclipse_id
by the heading and the subscriber_id
by the side.
> data.head(6).ix[:,2:8]
1222 52582 45552 122 589 568
0 0 0 0 0 0 0
1 0 0 1 0 0 0
2 0 0 0 0 0 0
3 0 3 0 0 0 0
4 0 0 0 0 0 0
5 0 0 0 0 5 0
I would come from a database swipe
with attributes eclipse_id
, subscriber_id
and state
but I only want to count those that are worth 2,3,5,6,8,9.
Here is the SQL query, I do not know if there is another way to do it:
SELECT COUNT (swipe.state),swipe.eclipse_id, swipe.subscriber_id FROM swipe
WHERE swipe.state= 2 OR swipe.state = 3 or swipe.state=5 OR swipe.state =6 or swipe.state=8 or swipe.state=9
GROUP BY swipe.subscriber_id,swipe.eclipse_id
I do not know how to do it from Datagrip with a file that we will open like this:
data = pd.read_csv('data.csv')
Or directly from Jupyter notebook ...
With Datagrip
At the moment I have about datagrip:
1 6874 179
1 10205 13
1 9958 13
1 9639 161
1 11128 185
I used Dump Data on the right to get it in /home
but it does not work: I have that:
Cannot write file: /home/SELECT__COUNT__swipe_state__swipe_eclips.csv
With Jupyter
I try to do it from Jupyter notebook but I am not able to print the results
import traceback
import psycopg2
import numpy as np
import datetime
from datetime import datetime
from operator import *
import pandas as pd
now = datetime.now()
print now
params = {
'database': 'escondida...',
'user': '',
'password': '',
'host': '',
'port': ''
}
try:
conn = psycopg2.connect(**params)
cur = conn.cursor()
cur.execute("""
-- nombre de personnes ayant telecharge notre produit
SELECT COUNT (swipe.state),swipe.eclipse_id, swipe.subscriber_id FROM swipe
WHERE swipe.state= 2 OR swipe.state = 3 or swipe.state=5 OR swipe.state =6 or swipe.state=8 or swipe.state=9
GROUP BY swipe.subscriber_id,swipe.eclipse_id;
""")
n = cur.fetchall()
print "\nnombre de personnes ayant telecharge notre produit \n"
print n
except psycopg2.Error as e:
print "I am unable to connect to the database"
print e
print e.pgcode
print e.pgerror
print traceback.format_exc()
I have this data of n
:
[(1L, 6874, 179), (1L, 10205, 13), (1L, 9958, 13), (1L, 9639, 161), (1L, 11128, 185), (1L, 9856, 190), (1L, 10663, 296), (1L, 6694, 158), (1L, 10294, 13), (1L, 6845, 156), (1L, 10431, 13), (1L, 7022, 156), (1L, 6943, 172), (1L, 9862, 189), (1L, 7645, 189), (1L, 7025, 156), (1L, 9618, 190), (1L, 10056, 13),...
I can do:
df1 = pd.DataFrame(listado)
print df1
gives me:
count eclipse_id subscriber_id
0 1 6874.0 179
1 1 10205.0 13
2 1 9958.0 13
3 1 9639.0 161
But it is not a matrix.
I also try:
pd.DataFrame(data=listado[1:,0],
index=listado[2:,0]
columns=listado[3:,0])
But I answer:
File "<ipython-input-33-f87ac772eb69>", line 3
columns=listado[3:,0])
^
SyntaxError: invalid syntax
The project is to make a recommendation system based on the user following the article in the Salem Marafi blog or the one in analyticsvidhya.com .