Indeed, your diagnosis is adequate. What happens is that merge()
does not accept a list of Dataframe
at most allows you to pass two of these objects. But you can iterate over the list and do the process of merge
. For example:
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.array([['a', 1, 2]]))
df2 = pd.DataFrame(np.array([['b', 3, 4]]))
df3 = pd.DataFrame(np.array([['c', 5, 6]]))
dfs = [df1, df2, df3]
We have created a list dfs
that contains 3 DataFrame
, now we can do the
merge
:
dfs = iter(dfs)
df_final = next(dfs)
for df_ in dfs:
df_final = df_final.merge(df_, left_index=True, right_index=True)
print(df_final)
0_x 1_x 2_x 0_y 1_y 2_y 0 1 2
0 a 1 2 b 3 4 c 5 6
Detail:
-
With dfs = iter(dfs)
we convert the list into a iterador
, this is the way we are going to process it, on one hand we need the first element and then the rest, it is preferable to do so and avoid making copies of lists.
-
With df_final = next(dfs)
we initialize the DataFrame
final% with the first object in the list
-
Then we simply iterate over the following elements in the list and with df_final = df_final.merge(df_, left_index=True, right_index=True)
we are doing the merge
of each object.
An identical result but with fewer lines of code is to use the reduce()
function %
from functools import reduce
dfs = [df1, df2, df3]
df_final = reduce(lambda left,right: pd.merge(left,right,left_index=True, right_index=True), dfs)
print(df_final)