Filling data gaps

1

I have created a code to fill in the gaps in my files that do not have data due to system failure, for this I have created a file of dates and this I am comparing it with the data file that I have. Supposedly my code would have to compare the first column of both files, and if it coincides with the one of dates, because the complete row is saved, if it should not put NAN.

The code I have made is this:

import numpy as np from openpyxl import load_workbook

#Cargamos el fichero excel que queramos procesar 
wb = load_workbook('5_ÁRBOL.xlsx',data_only=True) 
fechas= load_workbook('fechas 2011-2017.xlsx',data_only=True)

print ('Se han cargado los ficheros')

#Elegimos la página del excel que queramos procesar 
sheet1 = wb.get_sheet_by_name('arbol5mf1') 
sheet2 = fechas.get_sheet_by_name('Hoja1')

#Creamos varias matrices auxiliares que nos ayudarán en los cálculos 
matriz = np.zeros((736416,5)) 
fecha1 = np.zeros((sheet1.max_row,0)) 
fecha2 = np.zeros((sheet2.max_row,0)) 
copia_excel = np.zeros((sheet1.max_row,sheet1.max_column)) 
copia_fechas = np.zeros((sheet2.max_row,sheet2.max_column))

print ('Se han realizado las matrices auxiliares')

for i in range(3,sheet1.max_row):
    for j in range(1,sheet1.max_column):
        copia_excel[i,j]=sheet1.cell(row=i+1, column=j+1).value
         print ('Se ha rellenado la matriz auxiliar con los datos') for i in range(1,sheet2.max_row):
    for j in range(1,sheet2.max_column):
        copia_fechas[i,j]=sheet2.cell(row=i+1, column=j+1).value
         print ('Se ha rellenado la matriz auxiliar con las fechas')

         for i in range(0,0):
    for j in range(0,sheet1.max_column):
        fecha1[i,j]=sheet1.cell(row=i+1, column=j+1).datetime

print ('Se ha rellenado la matriz auxiliar horas del fichero con los datos')        

for i in range(0,0):
    for j in range(0,sheet2.max_column):
        fecha2[i,j]=sheet2.cell(row=i+1, column=j+1).datetime  

print ('Se ha rellenado la matriz auxiliar horas del fichero fechas')

datos01= fecha1[:,0] 
datos02= copia_excel[:,1] 
datos03= copia_excel[:,2] 
datos04= copia_excel[:,3] 
datos05= copia_excel[:,4] 
datos06= copia_excel[:,5]

fecha01= fecha2[:,0] 
fecha02= copia_fechas[:,1] 
fecha03= copia_fechas[:,2] 
fecha04= copia_fechas[:,3] 
fecha05= copia_fechas[:,4] 
fecha06= copia_fechas[:,5]

c_e = np.array([(datos01,datos02,datos03,datos04,datos05)]) 
c_f = np.array([(fecha01,fecha02,fecha03,fecha04,fecha05)])

print ('Se han creado dos matrices auxiliares')

for i in range(0,sheet2.max_row):
    for j in range (0,0):
        if c_e[i,j]==c_f[i,j]:
            matriz[i,j]=sheet1.cell(row=i+1, column=j+1).value
            print ('Se ha guadado el dato de:',[i,j])

print ('El proceso ha acabado')

I know that the code is very convoluted, but I'm not very focused on programming with python and it's the only way I could think of trying to do the code ...

When I put my code in motion, I get the following error:

ValueError: could not convert string to float: '# N / A'

Could someone help me solve this problem or could you tell me what I'm doing wrong? I leave the link of the files I use input (not the originals but if they are the style I need to process)

5_ÁRBOL.xlsx

Measurement Time    Port 1  Port 2  Port 3  Port 4  Port 5
26/05/2011 11:45      NaN   0,145   0,232   0,223   0,236
26/05/2011 11:50      NaN   0,145   0,232   0,223   0,236
26/05/2011 11:55      NaN   0,145   0,232   0,224   0,236
26/05/2011 12:00      NaN   0,145   0,232   0,224   0,236
26/05/2011 12:05      NaN   0,145   0,232   0,224   0,237
26/05/2011 12:10      NaN   0,145   0,232   0,224   0,237
26/05/2011 12:15      NaN   0,145   0,232   0,224   0,237
26/05/2011 12:20      NaN   0,145   0,232   0,224   0,237
26/05/2011 12:25      NaN   0,145   0,233   0,224   0,237
26/05/2011 12:30      NaN   0,145   0,233   0,225   0,238
26/05/2011 12:45      NaN   0,145   0,233   0,225   0,238
26/05/2011 12:50      NaN   0,145   0,233   0,225   0,238
26/05/2011 12:55      NaN   0,145   0,233   0,225   0,238
26/05/2011 13:00      NaN   0,145   0,233   0,225   0,239
26/05/2011 13:05      NaN   0,145   0,233   0,225   0,239
26/05/2011 13:10      NaN   0,145   0,233   0,225   0,239
26/05/2011 13:15      NaN   0,145   0,233   0,225   0,239
26/05/2011 13:20      NaN   0,145   0,233   0,225   0,239
26/05/2011 13:25      NaN   0,145   0,233   0,225   0,239
26/05/2011 13:30      NaN   0,145   0,233   0,225   0,239
26/05/2011 13:35      NaN   0,145   0,233   0,225   0,239
26/05/2011 13:40      NaN   0,145   0,233   0,225   0,239
26/05/2011 13:45      NaN   0,145   0,233   0,225   0,239
26/05/2011 13:50      NaN   0,145   0,233   0,225   0,24
26/05/2011 13:55      NaN   0,145   0,233   0,225   0,24
26/05/2011 14:00      NaN   0,145   0,233   0,225   0,24
26/05/2011 14:05      NaN   0,145   0,233   0,226   0,24
26/05/2011 14:10      NaN   0,145   0,233   0,226   0,24
26/05/2011 14:15      NaN   0,145   0,233   0,226   0,24
26/05/2011 14:20      NaN   0,145   0,233   0,226   0,24
26/05/2011 14:25      NaN   0,145   0,233   0,226   0,24
26/05/2011 14:30      NaN   0,145   0,233   0,226   0,24
26/05/2011 14:35      NaN   0,146   0,233   0,226   0,24
26/05/2011 14:40      NaN   0,146   0,233   0,226   0,24
26/05/2011 14:45      NaN   0,146   0,233   0,226   0,24
26/05/2011 14:50      NaN   0,146   0,233   0,226   0,24
26/05/2011 14:55      NaN   0,146   0,233   0,226   0,24
26/05/2011 15:00      NaN   0,146   0,233   0,226   0,24
26/05/2011 15:05      NaN   0,146   0,233   0,226   0,24
26/05/2011 15:10      NaN   0,146   0,233   0,226   0,24
26/05/2011 15:15      NaN   0,146   0,233   0,226   0,24
26/05/2011 15:20      NaN   0,146   0,233   0,226   0,241
26/05/2011 15:25      NaN   0,146   0,233   0,226   0,241
26/05/2011 15:30      NaN   0,146   0,233   0,226   0,241
26/05/2011 15:35      NaN   0,146   0,233   0,226   0,241
26/05/2011 15:40      NaN   0,146   0,233   0,226   0,241
26/05/2011 15:45      NaN   0,146   0,233   0,226   0,241
26/05/2011 15:50      NaN   0,146   0,233   0,226   0,241
26/05/2011 15:55      NaN   0,146   0,233   0,226   0,241
26/05/2011 16:00      NaN   0,146   0,233   0,226   0,241
26/05/2011 16:05      NaN   0,146   0,233   0,226   0,241
26/05/2011 16:10      NaN   0,146   0,233   0,226   0,241
26/05/2011 16:15      NaN   0,146   0,233   0,226   0,241
26/05/2011 16:20      NaN   0,146   0,233   0,226   0,241
26/05/2011 16:25      NaN   0,146   0,233   0,227   0,241
26/05/2011 16:30      NaN   0,146   0,233   0,227   0,241
26/05/2011 16:35      NaN   0,146   0,233   0,227   0,242
26/05/2011 16:40      NaN   0,146   0,233   0,227   0,242
26/05/2011 16:45      NaN   0,146   0,233   0,227   0,242
26/05/2011 16:50      NaN   0,146   0,233   0,227   0,242
26/05/2011 16:55      NaN   0,146   0,233   0,227   0,242
26/05/2011 17:00      NaN   0,146   0,233   0,227   0,242
26/05/2011 17:05      NaN   0,146   0,233   0,227   0,242
26/05/2011 17:10      NaN   0,146   0,233   0,227   0,242
26/05/2011 17:15      NaN   0,146   0,233   0,227   0,242
26/05/2011 17:20      NaN   0,146   0,233   0,227   0,242
26/05/2011 17:25      NaN   0,146   0,233   0,227   0,242
26/05/2011 17:30      NaN   0,146   0,233   0,227   0,242
26/05/2011 17:35      NaN   0,146   0,233   0,227   0,243
26/05/2011 17:40      NaN   0,146   0,233   0,227   0,243
26/05/2011 17:45      NaN   0,146   0,233   0,227   0,243
26/05/2011 17:50      NaN   0,147   0,233   0,227   0,243
26/05/2011 17:55      NaN   0,147   0,233   0,227   0,243
26/05/2011 18:00      NaN   0,147   0,233   0,227   0,243
26/05/2011 18:05      NaN   0,147   0,233   0,227   0,243
26/05/2011 18:10      NaN   0,147   0,233   0,227   0,243
26/05/2011 18:15      NaN   0,147   0,233   0,227   0,243
26/05/2011 18:20      NaN   0,147   0,233   0,227   0,243
26/05/2011 18:25      NaN   0,147   0,233   0,227   0,243
26/05/2011 18:30      NaN   0,147   0,233   0,227   0,243
26/05/2011 18:35      NaN   0,147   0,233   0,227   0,243
26/05/2011 18:40      NaN   0,147   0,233   0,227   0,243
26/05/2011 18:45      NaN   0,147   0,233   0,227   0,243
26/05/2011 18:50      NaN   0,147   0,233   0,227   0,243
26/05/2011 18:55      NaN   0,147   0,233   0,227   0,243
26/05/2011 19:00      NaN   0,147   0,233   0,227   0,243
26/05/2011 19:05      NaN   0,147   0,233   0,227   0,243
26/05/2011 19:10      NaN   0,147   0,233   0,227   0,243
26/05/2011 19:15      NaN   0,147   0,233   0,227   0,243
26/05/2011 19:20      NaN   0,147   0,233   0,227   0,243
26/05/2011 19:25      NaN   0,147   0,233   0,227   0,243
26/05/2011 19:30      NaN   0,147   0,233   0,227   0,243

dates 2011-2017.xlsx

26/05/2011 0:00 5   146 5   146
26/05/2011 0:05 5   146 5   146
26/05/2011 0:10 5   146 5   146
26/05/2011 0:15 5   146 5   146
26/05/2011 0:20 5   146 5   146
26/05/2011 0:25 5   146 5   146
26/05/2011 0:30 5   146 5   146
26/05/2011 0:35 5   146 5   146
26/05/2011 0:40 5   146 5   146
26/05/2011 0:45 5   146 5   146
26/05/2011 0:50 5   146 5   146
26/05/2011 0:55 5   146 5   146
26/05/2011 1:00 5   146 5   146
26/05/2011 1:05 5   146 5   146
26/05/2011 1:10 5   146 5   146
26/05/2011 1:15 5   146 5   146
26/05/2011 1:20 5   146 5   146
26/05/2011 1:25 5   146 5   146
26/05/2011 1:30 5   146 5   146
26/05/2011 1:35 5   146 5   146
26/05/2011 1:40 5   146 5   146
26/05/2011 1:45 5   146 5   146
26/05/2011 1:50 5   146 5   146
26/05/2011 1:55 5   146 5   146
26/05/2011 2:00 5   146 5   146
26/05/2011 2:05 5   146 5   146
26/05/2011 2:10 5   146 5   146
26/05/2011 2:15 5   146 5   146
26/05/2011 2:20 5   146 5   146
26/05/2011 2:25 5   146 5   146
26/05/2011 2:30 5   146 5   146
26/05/2011 2:35 5   146 5   146
26/05/2011 2:40 5   146 5   146
26/05/2011 2:45 5   146 5   146
26/05/2011 2:50 5   146 5   146
26/05/2011 2:55 5   146 5   146
26/05/2011 3:00 5   146 5   146
26/05/2011 3:05 5   146 5   146
26/05/2011 3:10 5   146 5   146
26/05/2011 3:15 5   146 5   146
26/05/2011 3:20 5   146 5   146
26/05/2011 3:25 5   146 5   146
26/05/2011 3:30 5   146 5   146
26/05/2011 3:35 5   146 5   146
26/05/2011 3:40 5   146 5   146
26/05/2011 3:45 5   146 5   146
26/05/2011 3:50 5   146 5   146
26/05/2011 3:55 5   146 5   146
26/05/2011 4:00 5   146 5   146
26/05/2011 4:05 5   146 5   146
26/05/2011 4:10 5   146 5   146
26/05/2011 4:15 5   146 5   146
26/05/2011 4:20 5   146 5   146
26/05/2011 4:25 5   146 5   146
26/05/2011 4:30 5   146 5   146
26/05/2011 4:35 5   146 5   146
26/05/2011 4:40 5   146 5   146
26/05/2011 4:45 5   146 5   146
26/05/2011 4:50 5   146 5   146
26/05/2011 4:55 5   146 5   146
26/05/2011 5:00 5   146 5   146
26/05/2011 5:05 5   146 5   146
26/05/2011 5:10 5   146 5   146
26/05/2011 5:15 5   146 5   146
26/05/2011 5:20 5   146 5   146
26/05/2011 5:25 5   146 5   146
26/05/2011 5:30 5   146 5   146
26/05/2011 5:35 5   146 5   146
26/05/2011 5:40 5   146 5   146
26/05/2011 5:45 5   146 5   146
26/05/2011 5:50 5   146 5   146
26/05/2011 5:55 5   146 5   146
26/05/2011 6:00 5   146 5   146
26/05/2011 6:05 5   146 5   146
26/05/2011 6:10 5   146 5   146
26/05/2011 6:15 5   146 5   146
26/05/2011 6:20 5   146 5   146
26/05/2011 6:25 5   146 5   146
26/05/2011 6:30 5   146 5   146
26/05/2011 6:35 5   146 5   146
26/05/2011 6:40 5   146 5   146
26/05/2011 6:45 5   146 5   146
26/05/2011 6:50 5   146 5   146
26/05/2011 6:55 5   146 5   146
26/05/2011 7:00 5   146 5   146
26/05/2011 7:05 5   146 5   146
26/05/2011 7:10 5   146 5   146
26/05/2011 7:15 5   146 5   146
26/05/2011 7:20 5   146 5   146
26/05/2011 7:25 5   146 5   146
26/05/2011 7:30 5   146 5   146
26/05/2011 7:35 5   146 5   146
26/05/2011 7:40 5   146 5   146
26/05/2011 7:45 5   146 5   146
26/05/2011 7:50 5   146 5   146
26/05/2011 7:55 5   146 5   146
26/05/2011 8:00 5   146 5   146
26/05/2011 8:05 5   146 5   146
26/05/2011 8:10 5   146 5   146
26/05/2011 8:15 5   146 5   146
26/05/2011 8:20 5   146 5   146
26/05/2011 8:25 5   146 5   146
26/05/2011 8:30 5   146 5   146
26/05/2011 8:35 5   146 5   146
26/05/2011 8:40 5   146 5   146
26/05/2011 8:45 5   146 5   146
26/05/2011 8:50 5   146 5   146
26/05/2011 8:55 5   146 5   146
26/05/2011 9:00 5   146 5   146
26/05/2011 9:05 5   146 5   146
26/05/2011 9:10 5   146 5   146
26/05/2011 9:15 5   146 5   146
26/05/2011 9:20 5   146 5   146
26/05/2011 9:25 5   146 5   146
26/05/2011 9:30 5   146 5   146
26/05/2011 9:35 5   146 5   146
26/05/2011 9:40 5   146 5   146
26/05/2011 9:45 5   146 5   146
26/05/2011 9:50 5   146 5   146
26/05/2011 9:55 5   146 5   146
26/05/2011 10:00    5   146 5   146
26/05/2011 10:05    5   146 5   146
26/05/2011 10:10    5   146 5   146
26/05/2011 10:15    5   146 5   146
26/05/2011 10:20    5   146 5   146
26/05/2011 10:25    5   146 5   146
26/05/2011 10:30    5   146 5   146
26/05/2011 10:35    5   146 5   146
26/05/2011 10:40    5   146 5   146
26/05/2011 10:45    5   146 5   146
26/05/2011 10:50    5   146 5   146
26/05/2011 10:55    5   146 5   146
26/05/2011 11:00    5   146 5   146
26/05/2011 11:05    5   146 5   146
26/05/2011 11:10    5   146 5   146
26/05/2011 11:15    5   146 5   146
26/05/2011 11:20    5   146 5   146
26/05/2011 11:25    5   146 5   146
26/05/2011 11:30    5   146 5   146
26/05/2011 11:35    5   146 5   146
26/05/2011 11:40    5   146 5   146
26/05/2011 11:45    5   146 5   146
26/05/2011 11:50    5   146 5   146
26/05/2011 11:55    5   146 5   146
26/05/2011 12:00    5   146 5   146
26/05/2011 12:05    5   146 5   146
26/05/2011 12:10    5   146 5   146
26/05/2011 12:15    5   146 5   146
26/05/2011 12:20    5   146 5   146
26/05/2011 12:25    5   146 5   146
26/05/2011 12:30    5   146 5   146
26/05/2011 12:35    5   146 5   146
26/05/2011 12:40    5   146 5   146
26/05/2011 12:45    5   146 5   146
26/05/2011 12:50    5   146 5   146
26/05/2011 12:55    5   146 5   146
26/05/2011 13:00    5   146 5   146
26/05/2011 13:05    5   146 5   146
26/05/2011 13:10    5   146 5   146
26/05/2011 13:15    5   146 5   146
26/05/2011 13:20    5   146 5   146
26/05/2011 13:25    5   146 5   146
26/05/2011 13:30    5   146 5   146
26/05/2011 13:35    5   146 5   146
26/05/2011 13:40    5   146 5   146
26/05/2011 13:45    5   146 5   146
26/05/2011 13:50    5   146 5   146
26/05/2011 13:55    5   146 5   146
26/05/2011 14:00    5   146 5   146
26/05/2011 14:05    5   146 5   146
26/05/2011 14:10    5   146 5   146
26/05/2011 14:15    5   146 5   146
26/05/2011 14:20    5   146 5   146
26/05/2011 14:25    5   146 5   146
26/05/2011 14:30    5   146 5   146
26/05/2011 14:35    5   146 5   146
26/05/2011 14:40    5   146 5   146
26/05/2011 14:45    5   146 5   146
26/05/2011 14:50    5   146 5   146
26/05/2011 14:55    5   146 5   146
26/05/2011 15:00    5   146 5   146
26/05/2011 15:05    5   146 5   146
26/05/2011 15:10    5   146 5   146
26/05/2011 15:15    5   146 5   146
26/05/2011 15:20    5   146 5   146
26/05/2011 15:25    5   146 5   146
26/05/2011 15:30    5   146 5   146
26/05/2011 15:35    5   146 5   146
26/05/2011 15:40    5   146 5   146
26/05/2011 15:45    5   146 5   146
26/05/2011 15:50    5   146 5   146
26/05/2011 15:55    5   146 5   146
26/05/2011 16:00    5   146 5   146
26/05/2011 16:05    5   146 5   146
26/05/2011 16:10    5   146 5   146
26/05/2011 16:15    5   146 5   146
26/05/2011 16:20    5   146 5   146
26/05/2011 16:25    5   146 5   146
26/05/2011 16:30    5   146 5   146
26/05/2011 16:35    5   146 5   146
26/05/2011 16:40    5   146 5   146
26/05/2011 16:45    5   146 5   146
26/05/2011 16:50    5   146 5   146
26/05/2011 16:55    5   146 5   146
26/05/2011 17:00    5   146 5   146
26/05/2011 17:05    5   146 5   146
26/05/2011 17:10    5   146 5   146
26/05/2011 17:15    5   146 5   146
26/05/2011 17:20    5   146 5   146
26/05/2011 17:25    5   146 5   146
26/05/2011 17:30    5   146 5   146
26/05/2011 17:35    5   146 5   146
26/05/2011 17:40    5   146 5   146
26/05/2011 17:45    5   146 5   146
26/05/2011 17:50    5   146 5   146
26/05/2011 17:55    5   146 5   146
26/05/2011 18:00    5   146 5   146
26/05/2011 18:05    5   146 5   146
26/05/2011 18:10    5   146 5   146
26/05/2011 18:15    5   146 5   146
26/05/2011 18:20    5   146 5   146
26/05/2011 18:25    5   146 5   146
26/05/2011 18:30    5   146 5   146
26/05/2011 18:35    5   146 5   146
26/05/2011 18:40    5   146 5   146
26/05/2011 18:45    5   146 5   146
26/05/2011 18:50    5   146 5   146
26/05/2011 18:55    5   146 5   146
26/05/2011 19:00    5   146 5   146
26/05/2011 19:05    5   146 5   146
26/05/2011 19:10    5   146 5   146
26/05/2011 19:15    5   146 5   146
26/05/2011 19:20    5   146 5   146
26/05/2011 19:25    5   146 5   146
26/05/2011 19:30    5   146 5   146
26/05/2011 19:35    5   146 5   146
26/05/2011 19:40    5   146 5   146
26/05/2011 19:45    5   146 5   146
26/05/2011 19:50    5   146 5   146
26/05/2011 19:55    5   146 5   146
26/05/2011 20:00    5   146 5   146
26/05/2011 20:05    5   146 5   146
26/05/2011 20:10    5   146 5   146
26/05/2011 20:15    5   146 5   146
26/05/2011 20:20    5   146 5   146
26/05/2011 20:25    5   146 5   146
26/05/2011 20:30    5   146 5   146
26/05/2011 20:35    5   146 5   146
26/05/2011 20:40    5   146 5   146
26/05/2011 20:45    5   146 5   146
26/05/2011 20:50    5   146 5   146
26/05/2011 20:55    5   146 5   146
26/05/2011 21:00    5   146 5   146
26/05/2011 21:05    5   146 5   146
26/05/2011 21:10    5   146 5   146
26/05/2011 21:15    5   146 5   146
26/05/2011 21:20    5   146 5   146
26/05/2011 21:25    5   146 5   146
26/05/2011 21:30    5   146 5   146
26/05/2011 21:35    5   146 5   146
26/05/2011 21:40    5   146 5   146
26/05/2011 21:45    5   146 5   146
26/05/2011 21:50    5   146 5   146
26/05/2011 21:55    5   146 5   146
26/05/2011 22:00    5   146 5   146
26/05/2011 22:05    5   146 5   146
26/05/2011 22:10    5   146 5   146
26/05/2011 22:15    5   146 5   146
26/05/2011 22:20    5   146 5   146
26/05/2011 22:25    5   146 5   146
26/05/2011 22:30    5   146 5   146
26/05/2011 22:35    5   146 5   146
26/05/2011 22:40    5   146 5   146
26/05/2011 22:45    5   146 5   146
26/05/2011 22:50    5   146 5   146
26/05/2011 22:55    5   146 5   146
26/05/2011 23:00    5   146 5   146
26/05/2011 23:05    5   146 5   146
26/05/2011 23:10    5   146 5   146
26/05/2011 23:15    5   146 5   146
26/05/2011 23:20    5   146 5   146
26/05/2011 23:25    5   146 5   146
26/05/2011 23:30    5   146 5   146
26/05/2011 23:35    5   146 5   146
26/05/2011 23:40    5   146 5   146
26/05/2011 23:45    5   146 5   146
26/05/2011 23:50    5   146 5   146
26/05/2011 23:55    5   146 5   146

What I am looking for is that when the data in the first column matches, I keep the complete row of the ARBOL file in the result file, if not I will put everything as NAN in order to fill in the gaps.

Thank you very much.

    
asked by Mario 08.10.2018 в 12:52
source

1 answer

0

As you say, your code is unnecessarily complex.

If I did not get it wrong, I think it's about completing the missing dates / times in the Arbol.xlsx file. For example, it is observed that there are samples every five minutes, but some of them are missing, like those that would correspond to 26/05/2011 12:35 and 26/05/2011 12:40 , among others.

Apparently the data of the file fechas.xlsx is irrelevant, and you only use it to obtain the list of date / time for which there should be something in the other, and thus be able to detect what dates are missing.

All this is infinitely easier using pandas (as long as you know the ins and outs of this library, which is the longest) , well pandas allows you:

  • Read / write excel directly, with a single line (below use openpyxl for it) giving you as a result a table
  • Operate directly with the table "as a whole" without having to write loops
  • Parse the strings that represent dates / times to convert them into datetime objects allowing sophisticated dates calculations (which you do not need now, but may come well at another time)
  • Generate date ranges. This is very important! You can give a start / end time and date and a frequency like 5 minutes and pandas generates the list of all timestamp intermediates
  • Use any column in the table as "index"
  • Mix tables based on their "index" (filling with NaN the ones that do not match)
  • Reindex a table based on another index (filling in with the missing NaN)
  • etc. etc.

Of all the above, I'm going to stay with the reading / writing of the excel, the parsing of the dates, the indexation of the table based on the dates column, the generation of date ranges and the reindexing of the table according to the new range.

In order to work with a sample table I have copied and pasted the one that you have provided as text (pandas can also read csv, plain text and other formats). In case anyone wants to reproduce my steps, I did the following:

# Copiar a una variable de texto la tabla dada por el usuario
# (no la escribo completa aqui para no ser pesado)
data = """
Measurement Time    Port 1  Port 2  Port 3  Port 4  Port 5
26/05/2011 11:45      NaN   0,145   0,232   0,223   0,236
26/05/2011 11:50      NaN   0,145   0,232   0,223   0,236
...
...
26/05/2011 19:25      NaN   0,147   0,233   0,227   0,243
26/05/2011 19:30      NaN   0,147   0,233   0,227   0,243
"""

# Usar pandas para leerla y tenerla como tabla (dataframe)
import pandas as pd
import io
tabla = pd.read_table(io.StringIO(data), sep=r"\s\s+")

In the previous code I have put as column separator the regular expression \s\s+ which means "two spaces or more". In your case, you are supposed to have this table in an excel sheet so you would read it like this:

tabla = pd.read_excel('5_ÁRBOL.xlsx', sheet_name='arbol5mf1')
# Quizás fueran necesarias otras opciones si la primera fila de la hoja
# no son las cabeceras de las columnas, u otras variaciones

In any case the table once loaded will look like this:

The date column is still no more than a series of chains, meaningless for pandas. The index is a mere sequence of integers 0, 1, 2, ... then we will change this.

As for the second spreadsheet, we do not really need to read it. We just need to know what the start / end date / time is. In the following I will limit myself to fix those variables with a pair of date / hour of example. In your case you could if you prefer to read it from the excel, also using pandas or as you did before.

For example (I choose to start at 11:45 so that the result is better seen):

timestamp_inicio = "26/05/2011 11:45"
timestamp_end = "26/05/2011 23:55"

Once we have the table loaded we take the following steps:

# 1, Indicarle a pandas que el índice de la tabla sea la columna Measurement Time
# y después eliminar esa columna (pues ya es el índice)
tabla.index = pd.to_datetime(tabla["Measurement Time"])
tabla = tabla.drop("Measurement Time", axis=1)

Now the index is the dates, which have been converted to the datetime type:

And finally filling in the missing dates:

# 2. Creo un rango de timestamps espaciados 5 minutos
rango_fechas = pd.date_range(start=timestamp_inicio, end=timestamp_end, freq="5min")
# 3. Reindexo la tabla con este nuevo rango
tabla = tabla.reindex(rango_fechas)
tabla.index.name = "Measurement Time"  # Hay que ponerle nombre que se perdió al reindexar

I show the first 13 rows of the resulting table so you can see how you have filled in what was missing with NaN:

It only remains to save the result as excel:

tabla.to_excel("resultado.xlsx")

When we opened that excel we found that the cells that were Nan in pandas are empty:

but if you prefer you can specify that the chain you want appears:

tabla.to_excel("resultado.xlsx", na_rep="#N/A")
    
answered by 22.10.2018 / 10:21
source