I'm doing a downtime analysis of a production line.
I have a DataFrame in CSV format with the following columns:
'index', 'id_planta', 'fecha', 'linea', 'turno', 'Supervisor', 'CategoriaTM', 'CausaTM', 'duracionTM'
Only the durationTM column has a numeric value.
Using pandas and I want to group the data by CausaTM
. One way to do it is:
total_tm_linea16_prod_arranque_t1 = int(df['duracionTM'][df["linea"] == "P3 PREMEZCLAS"][df["turno"] == 'T1'][df["CategoriaTM"] == 'PRODUCCION'][df["CausaTM"] == 'ARRANQUE'].sum())
However, I do not want to do it this way since I would have to make a line of code for each cause and I have many, in addition, the user can create new causes and this would affect my code. What I want is to obtain a DataFrame where I get the total dead time of each cause of the different dates that comprise my file (in this step, the file already has only the dates I want to work).
The csv file. is:
index,id_planta,fecha,linea,turno,Supervisor,CategoriaTM,CausaTM,duracionTM
1,P3,01/10/2017,P3 EMB. AUTOMATICO 1,T1,CARLOS ROSAS,BODEGA DE CONGELACION,FALTA DE INGREDIENTE,55
2,P3,01/10/2017,P3 EMB. AUTOMATICO 1,T1,CARLOS ROSAS,PRODUCCION,ABASTECIMIENTO DE PRODUCTO A LA LINEA,32
3,P3,01/10/2017,P3 EMB. AUTOMATICO 1,T1,CARLOS ROSAS,PRODUCCION,ARRANQUE,40
4,P3,01/10/2017,P3 EMB. AUTOMATICO 1,T1,CARLOS ROSAS,PRODUCCION,CAMBIO DE PRODUCTO,65
5,P3,02/10/2017,P3 EMB. AUTOMATICO 1,T1,ANTONIO SANTAROSA,BODEGA DE CONGELACION,FALTA DE INGREDIENTE,20
6,P3,02/10/2017,P3 EMB. AUTOMATICO 1,T1,ANTONIO SANTAROSA,PRODUCCION,AJUSTE DE MAQUINA,10
7,P3,02/10/2017,P3 EMB. AUTOMATICO 1,T1,ANTONIO SANTAROSA,PRODUCCION,ARRANQUE,20
8,P3,02/10/2017,P3 EMB. AUTOMATICO 1,T1,ANTONIO SANTAROSA,PRODUCCION,CAMBIO DE PRODUCTO,30
9,P3,02/10/2017,P3 EMB. AUTOMATICO 1,T2,CARLOS ROSAS,BODEGA DE CONGELACION,FALTA DE INGREDIENTE,45
10,P3,02/10/2017,P3 EMB. AUTOMATICO 1,T2,CARLOS ROSAS,MANTENIMIENTO,CAMBIO DE TEFLONES,4
11,P3,02/10/2017,P3 EMB. AUTOMATICO 1,T2,CARLOS ROSAS,PRODUCCION,AJUSTE DE MAQUINA,38
12,P3,02/10/2017,P3 EMB. AUTOMATICO 1,T2,CARLOS ROSAS,PRODUCCION,CAMBIO DE PRODUCTO,45
The expected output (obtained through an Excel pivot table is:
T1
BODEGA DE CONGELACION
FALTA DE INGREDIENTE 75
PRODUCCION
ABASTECIMIENTO DE PRODUCTO A LA LINEA 32
AJUSTE DE MAQUINA 10
ARRANQUE 60
CAMBIO DE PRODUCTO 95
T2
BODEGA DE CONGELACION
FALTA DE INGREDIENTE 45
MANTENIMIENTO
CAMBIO DE TEFLONES 4
PRODUCCION
AJUSTE DE MAQUINA 38
CAMBIO DE PRODUCTO 45
What I did was add the times of each cause of the two dates that match the criteria shift , categoryTM and causeTM .
When trying to group them I use the following code:
df2 = df[(df["linea"] == "P3 EMB. AUTOMATICO 1") & (df["turno"] == "T1")]
df2.groupby('CausaTM').duracionTM.sum()
However, the output before and after groupby
is the same, that is, I do not add the two dates.