Link Pandas dataframes with a loop

2

I am chaining Pandas Dataframes in the following way:

import os
import pandas as pd

file = [file for file in os.listdir() if file.endswith('csv')]
dict = {}
for file in file:
    d[file] = pd.read_csv(file)

df_A = pd.concat([dict['A_0.csv'], dict['A_1.csv']], axis=1)

if instead of having two files (A_0.csv and A_1.csv) I had many more, what loop would I have to create to chain all the Dataframes to me?

Thank you very much.

    
asked by Rg111 13.11.2017 в 15:48
source

1 answer

2

You can use the glob module to filter the files. It is advisable that you use iterators instead of creating intermediate lists, dictionaries or dataframes that you will not use anymore.

One option would be:

import glob
import os
import pandas as pd

ruta = ''  # Ruta al directorio que contiene los csv 
archivos_csv = glob.iglob(os.path.join(ruta, "*.csv")) 
dataframes = (pd.read_csv(csv) for csv in archivos_csv)
df  = pd.concat(dataframes, axis=1)
  

Note: You should not use dict as the identifier of a variable, overwrite the class dict and you may end up with unexpected results. In any case use dict_ .

Edit:

If you want to print the names of the files, you can do without glob.iglob (return an iterator) and use glob.glob (return a list):

ruta = '' 
archivos_csv = glob.glob(os.path.join(ruta, "*.csv")) 
print("Archivos csv encontrados:")
print(*(os.path.basename(path) for path in archivos_csv),   sep= "\n")

#En Python 2.x cambiar por:
#print "Archivos csv encontrados:"
#for nombre in (os.path.basename(path) for path in archivos_csv):
#    print nombre

Exit:

  

csv files found:
  2.csv
  1.csv
  3.csv

Or print a list with the names of the files directly:

ruta = '' 
archivos_csv = glob.glob(os.path.join(ruta, "*.csv")) 
print([os.path.basename(path) for path in archivos_csv])

Exit:

  

['2.csv', '1.csv', '3.csv']

  

Note: glob does not return the files in a certain order, if you want to open the files according to a certain order we can use sorted / list.sort on the output of glob.glob : archivos_csv = sorted(glob.iglob(os.path.join(ruta, "*.csv")))

    
answered by 13.11.2017 / 17:37
source