How to store imported data to Python in a memory cache

0

Working with Python 3.5. and Pandas

I am developing an application which performs a series of calculations and processes on a dataframe in Pandas. I filled that dataframe through a select that I launched against my database in SQL Server through the pypodbc library.

import pyodbc
import pandas.io.sql as pd

conn = pyodbc.connect('DRIVER={SQL Server};SERVER=GENIL\Luna;DATABASE=Central;UID=sa;PWD=1')

sql="Select IdActivo,NombreActivo,tickeractivo from Activos"
df = pd.read_sql(sql,conn)

df.head(10)
print (df)

Well, said select returns more than 15 million records so it takes almost 10 minutes to receive it and store it in the panda dataframe for further processing.

My question is whether it is possible to create and manage the result of this select in cache so that it is much faster to load data in any dataframe that you need throughout the process.

Any suggestions on how I can achieve this?

Thank you very much

Angel

    
asked by Angel Gálvez 19.01.2017 в 16:17
source

1 answer

1

In summary, what you ask is to give "persistence" to Dataframe to avoid re-reading it from the database.

pandas offers some solutions for storing data locally: link

My recommendation is that you try HDF5 ( Pytables ):

with pd.HDFStore('store.h5') as store:
    store['df'] = df

When you need it, get it back:

with pd.HDFStore('store.h5') as store:
    df = store['df']
  

Note : I assume panda import is done as import pandas as pd and not as import pandas.io.sql as pd that puts the question.

    
answered by 19.01.2017 / 18:05
source