If I have understood your problem, you can use pandas.DataFrame.merge
and use the column Cliente
as a unique key. Understand that each new xlsx does not contain dates that are already present in previous updates , that is, there is no overlap of columns between the new data and the previous ones.
Starting with two xlsx files such that:
file1.xlsx:
file2.xlsx:
We can do:
import pandas as pd
import pandas.io.formats.excel
# Leemos ambos archivos y los cargamos en DataFrames
df1 = pd.read_excel("archivo1.xlsx")
df2 = pd.read_excel("archivo2.xlsx")
# Pivotamos ambas tablas
df1 = df1.pivot(index = "Cliente", columns='Fecha', values='Revenue')
df2 = df2.pivot(index = "Cliente", columns='Fecha', values='Revenue')
# Unimos ambos dataframes tomando la columna "Cliente" como clave
merged = pd.merge(df1, df2, right_index =True, left_index = True, how='outer')
merged.sort_index(axis=1, inplace=True)
# Creamos el xlsx de salida
pandas.io.formats.excel.header_style = None
with pd.ExcelWriter("Data.xlsx",
engine='xlsxwriter',
date_format='dd/mm/yyyy',
datetime_format='dd/mm/yyyy') as writer:
merged.to_excel(writer, sheet_name='Sheet1')
output.xlsx:
If part of a file Data.xlsx
previous that you want to update with another with the format of Archivo1.xlsx
and Archivo2.xlsx
just pivot this last file and apply the merge
on the Dataframe of the first:
import pandas as pd
import pandas.io.formats.excel
data = pd.read_excel("Data.xlsx", index_col='Cliente')
new_data = pd.read_excel("archivo.xlsx")
new_data = new_data.pivot(index = "Cliente", columns='Fecha', values='Revenue')
merged = pd.merge(data, new_data, right_index =True, left_index = True, how='outer')
merged.sort_index(axis=1, inplace=True)
pandas.io.formats.excel.header_style = None
with pd.ExcelWriter("Data.xlsx",
engine='xlsxwriter',
date_format='dd/mm/yyyy',
datetime_format='dd/mm/yyyy') as writer:
merged.to_excel(writer, sheet_name='Sheet1')
Data.xlsx (initial) :
.xlsx file (update):
Data.xlsx (final) :
If you are going to handle a considerable amount of data in the end, there are better ways to do this than using Excel files. A database would be a considerably more efficient option, generating an xlsx from it when required, but storing and making the updates in the database itself.