How to transform a SQL query into a panda dataframe matrix?

2

I want to set an array from a postgresql database so that we have the eclipse_id by the heading and the subscriber_id by the side.

> data.head(6).ix[:,2:8]

   1222  52582       45552        122  589  568
0     0      0           0          0    0    0
1     0      0           1          0    0    0
2     0      0           0          0    0    0
3     0      3           0          0    0    0
4     0      0           0          0    0    0
5     0      0           0          0    5    0

I would come from a database swipe with attributes eclipse_id , subscriber_id and state but I only want to count those that are worth 2,3,5,6,8,9.

Here is the SQL query, I do not know if there is another way to do it:

SELECT  COUNT (swipe.state),swipe.eclipse_id, swipe.subscriber_id FROM swipe
  WHERE swipe.state= 2 OR swipe.state = 3 or swipe.state=5 OR swipe.state =6 or swipe.state=8 or swipe.state=9
    GROUP BY swipe.subscriber_id,swipe.eclipse_id

I do not know how to do it from Datagrip with a file that we will open like this:

data = pd.read_csv('data.csv')

Or directly from Jupyter notebook ...

With Datagrip

At the moment I have about datagrip:

1   6874    179
1   10205   13
1   9958    13
1   9639    161
1   11128   185

I used Dump Data on the right to get it in /home but it does not work: I have that:

Cannot write file: /home/SELECT__COUNT__swipe_state__swipe_eclips.csv

With Jupyter

I try to do it from Jupyter notebook but I am not able to print the results

import traceback
import psycopg2
import numpy as np
import datetime
from datetime import datetime
from operator import *
import pandas as pd

now = datetime.now()

print now


params = {
  'database': 'escondida...',
  'user': '',
  'password': '',
  'host': '',
  'port': ''
}

try:
    conn = psycopg2.connect(**params)
    cur = conn.cursor()

    cur.execute("""
    -- nombre de personnes ayant telecharge notre produit 
    SELECT  COUNT (swipe.state),swipe.eclipse_id, swipe.subscriber_id FROM swipe
  WHERE swipe.state= 2 OR swipe.state = 3 or swipe.state=5 OR swipe.state =6 or swipe.state=8 or swipe.state=9
    GROUP BY swipe.subscriber_id,swipe.eclipse_id;
    """)
    n = cur.fetchall()
    print "\nnombre de personnes ayant telecharge notre produit  \n"
    print n

except psycopg2.Error as e:
    print "I am unable to connect to the database"
    print e
    print e.pgcode
    print e.pgerror
    print traceback.format_exc()

I have this data of n :

[(1L, 6874, 179), (1L, 10205, 13), (1L, 9958, 13), (1L, 9639, 161), (1L, 11128, 185), (1L, 9856, 190), (1L, 10663, 296), (1L, 6694, 158), (1L, 10294, 13), (1L, 6845, 156), (1L, 10431, 13), (1L, 7022, 156), (1L, 6943, 172), (1L, 9862, 189), (1L, 7645, 189), (1L, 7025, 156), (1L, 9618, 190), (1L, 10056, 13),...

I can do:

df1 = pd.DataFrame(listado)
print df1

gives me:

count  eclipse_id  subscriber_id
0         1      6874.0            179
1         1     10205.0             13
2         1      9958.0             13
3         1      9639.0            161

But it is not a matrix.

I also try:

pd.DataFrame(data=listado[1:,0],
            index=listado[2:,0]
            columns=listado[3:,0])

But I answer:

  File "<ipython-input-33-f87ac772eb69>", line 3
    columns=listado[3:,0])
          ^
SyntaxError: invalid syntax

The project is to make a recommendation system based on the user following the article in the Salem Marafi blog or the one in analyticsvidhya.com .

    
asked by ThePassenger 01.06.2017 в 18:30
source

2 answers

1

Well, "n" at the time of printing is already an array, I do not know what you want.

listado = [{"count": elem[0], "eclipse_id": elem[1], "subscriber_id": elem[2]} for elem in n]
print(listado[0]['count'])  # Esto deberia imprimir 1L

That will give you a list of dictionaries.

    
answered by 01.06.2017 в 18:40
0

Use from_records to import lists to DataFrames

import pandas as pd
n = [('1L', 6874, 179), ('1L', 10205, 13)]
labels=['count','eclipse_id','subscriber_id']
df = pd.DataFrame.from_records(n, columns=labels)
df.head()

    
answered by 03.06.2017 в 14:32