Concatenate pandas DataFrames unifying by Dates

1

I have two DataFrames of pandas with Fechas and Datos :

df_A = pd.DataFrame({
    'Fecha': ['15/12/2018', '16/12/2018', '18/12/2018'],
    'Datos_A': [1, 5, 9]
})
df_B = pd.DataFrame({
    'Fecha': ['15/12/2018', '16/12/2018', '17/12/2018', '19/12/2018'],
    'Datos_B': [7, 5, 3, 4]
})

I convert the dates to the datetime format.

df_A.Fecha = pd.to_datetime(df_A.Fecha)
df_B.Fecha = pd.to_datetime(df_B.Fecha)

df_A

    Datos_A Fecha
0   1       15/12/2018
1   5       16/12/2018
2   9       18/12/2018

df_B

    Datos_B Fecha
0   7       15/12/2018
1   5       16/12/2018
2   3       17/12/2018
3   4       19/12/2018

How can I concatenate both DataFrames in such a way that the Fecha column is unified to obtain the following result?

         Fecha Datos_A  Datos_B
0   2018-12-15       1        7
1   2018-12-16       5        5
2   2018-12-17     NaN        3
3   2018-12-18       9      NaN
4   2018-12-19     NaN        4
    
asked by Pedro Biel 21.12.2018 в 13:47
source

2 answers

0

As the author of the question has discovered, pd.merge() is the solution he was looking for, although he can only operate on two dataframes.

To combine several dataframes, it is possible to repeat the same operation between the result of pd.merge() previous and the next dataframe.

By using functools.reduce the syntax can be condensed (although in essence it is the same, make a merge between two, one being the result of merge previous).

Example:

import pandas as pd
from functools import reduce

# Definir una función que, dados dos dataframes, retorne uno
# con la combinación deseada
def juntar(left, right):
  return pd.merge(left, right, on='Fecha', how='outer')

# Defino cuatro dataframes de ejemplo
df_A = pd.DataFrame({
    'Fecha': ['15/12/2018', '16/12/2018', '18/12/2018'],
    'Datos_A': [1, 5, 9]
})
df_B = pd.DataFrame({
    'Fecha': ['15/12/2018', '16/12/2018', '17/12/2018', '19/12/2018'],
    'Datos_B': [7, 5, 3, 4]
})
df_C = pd.DataFrame({
    'Fecha': ['15/12/2018', '17/12/2018', '18/12/2018'],
    'Datos_C': [6, 15, 8]
})
df_D = pd.DataFrame({
    'Fecha': ['15/12/2018', '16/12/2018', '18/12/2018', '19/12/2018'],
    'Datos_D': [17, 15, 13, 14]
})

# Uso reduce para juntar en uno solo los cuatro, haciendo uso
# de la función juntar() antes vista. Uso "Fecha" como
# indice y ordeno el índice
reduce(juntar, [df_A, df_B, df_C, df_D]).set_index("Fecha").sort_index()
            Datos_A  Datos_B  Datos_C  Datos_D
Fecha                                         
15/12/2018      1.0      7.0      6.0     17.0
16/12/2018      5.0      5.0      NaN     15.0
17/12/2018      NaN      3.0     15.0      NaN
18/12/2018      9.0      NaN      8.0     13.0
19/12/2018      NaN      4.0      NaN     14.0
    
answered by 22.12.2018 / 19:59
source
1

The solution was not that complicated after all.

df = pd.merge(df_A, df_B, on='Fecha', how='outer')
df.sort_values(by='Fecha', inplace=True)
df[['Fecha', 'Datos_A', 'Datos_B']]

    Fecha       Datos_A Datos_B
0   2018-12-15      1.0     7.0
1   2018-12-16      5.0     5.0
3   2018-12-17      NaN     3.0
2   2018-12-18      9.0     NaN
4   2018-12-19      NaN     4.0

Edited on 22.12.2018

At the request of @shadow I will explain the solution.

In the first line of code merge the two DataFrames df_A and df_B , with Fecha the column to be joined and indicating in how='outer' that the union of the keys of both be used frames For more information see the documentation of pandas .

The second line of code sorts the values by Fecha and replaces the result in df . More info here .

The third line of code shows the new df sorting the columns according to Fecha , Datos_A and Datos_B .

Afterwards I realized that this solution is only valid for two DataFrames . If someone has a better solution for more than two DataFrames do not hesitate to expose it.

    
answered by 21.12.2018 в 14:20