Error "IOError: [Errno 22] invalid mode ('w') or filename" when creating csv file

1

I need to convert this Python code that I use in MAC to some that I can use in Windows 10 with the same version of Python (2.7).

Basically I open a csv and I load it in a DataFrame, I group by one of the columns and create a new csv for each group, taking the name of each output csv from the column on which I apply groupby .

import pandas as pd

df = pd.read_csv("All_Data_Tags.csv",  header=0,  sep = ",")
for group in df.groupby(df["trafficChannel"]):
    group[1].to_csv("{}.csv".format(group[0]), sep=',', index=False)

I run it on Mac without problems but in Windows I get this error:

  

IOError: [Errno 22] invalid mode ('w') or filename: '152 Media DK $ 2 | 09-11-17.csv

    
asked by Martin Bouhier 08.12.2017 в 00:07
source

1 answer

1

The problem is that the characters | , / , \ , : , ? , * , < , > and " are reserved characters not allowed in the name of files / files in Windows.

Since you get the names of a column in your csv, the solution you have left is to replace the non-allowed character, in this case "|" for another that does not present that problem.

For this you can use the pandas.Series.str.replace method which applies the vectorized replacement on the column:

import pandas as pd

dat = {"trafficChannel": ['152 Media DK $ 2 | 09-11-17',
                          '153 Media DK $ 2 | 09-11-17']
      }

df = pd.DataFrame(dat)
df["trafficChannel"] = df["trafficChannel"].str.replace("|", "%")

Use the character you want to replace, or an empty string to remove them completely. In this case, change by % :

>>> df

                 trafficChannel
0   152 Media DK $ 2 % 09-11-17
1   153 Media DK $ 2 % 09-11-17

In your case you can simply do:

import pandas as pd

df = pd.read_csv("All_Data_Tags.csv",  header=0,  sep = ",")
df["trafficChannel"] = df["trafficChannel"].str.replace("|", "%")
for group in df.groupby(df["trafficChannel"]):
    group[1].to_csv("{}.csv".format(group[0]), sep=',', index=False)

You could substitute different characters at once if you think there might be other invalid characters in the column.

Another option is to apply str.replace in each iteration of for :

for group in df.groupby(df["trafficChannel"]):
    group[1].to_csv("{}.csv".format(group[0].replace("|", "%")),
                    sep=',', index=False)

in this case for multiple replacements you can use str.translate .

    
answered by 08.12.2017 / 00:41
source