Python dynamic tables

0

I receive a report with this data every day:

ALL the values of the headers can change although the Dates field is always going to stay the same for each client, to which I refer:

What I need is that in my file called Data.xlsx I'm loading the data as in a dynamic table. Read my daily reports and fit them in my Data.xlsx file in the way I express below

Something like that would be:

    
asked by Martin Bouhier 21.12.2017 в 13:40
source

1 answer

1

If I have understood your problem, you can use pandas.DataFrame.merge and use the column Cliente as a unique key. Understand that each new xlsx does not contain dates that are already present in previous updates , that is, there is no overlap of columns between the new data and the previous ones.

Starting with two xlsx files such that:

file1.xlsx:

file2.xlsx:

We can do:

import pandas as pd
import pandas.io.formats.excel

# Leemos ambos archivos y los cargamos en DataFrames
df1 = pd.read_excel("archivo1.xlsx")
df2 = pd.read_excel("archivo2.xlsx")

# Pivotamos ambas tablas
df1 = df1.pivot(index = "Cliente", columns='Fecha', values='Revenue')
df2 = df2.pivot(index = "Cliente", columns='Fecha', values='Revenue')

# Unimos ambos dataframes tomando la columna "Cliente" como clave
merged = pd.merge(df1, df2, right_index =True, left_index = True, how='outer')
merged.sort_index(axis=1, inplace=True)

# Creamos el xlsx de salida
pandas.io.formats.excel.header_style = None

with pd.ExcelWriter("Data.xlsx",
                    engine='xlsxwriter',
                    date_format='dd/mm/yyyy',
                    datetime_format='dd/mm/yyyy') as writer:

    merged.to_excel(writer, sheet_name='Sheet1')

output.xlsx:

If part of a file Data.xlsx previous that you want to update with another with the format of Archivo1.xlsx and Archivo2.xlsx just pivot this last file and apply the merge on the Dataframe of the first:

import pandas as pd
import pandas.io.formats.excel

data = pd.read_excel("Data.xlsx",  index_col='Cliente')
new_data = pd.read_excel("archivo.xlsx")
new_data = new_data.pivot(index = "Cliente", columns='Fecha', values='Revenue')
merged = pd.merge(data, new_data, right_index =True, left_index = True, how='outer')
merged.sort_index(axis=1, inplace=True)

pandas.io.formats.excel.header_style = None
with pd.ExcelWriter("Data.xlsx",
                    engine='xlsxwriter',
                    date_format='dd/mm/yyyy',
                    datetime_format='dd/mm/yyyy') as writer:
    merged.to_excel(writer, sheet_name='Sheet1')

Data.xlsx (initial) :

.xlsx file (update):

Data.xlsx (final) :

If you are going to handle a considerable amount of data in the end, there are better ways to do this than using Excel files. A database would be a considerably more efficient option, generating an xlsx from it when required, but storing and making the updates in the database itself.

    
answered by 22.12.2017 / 00:11
source