Separate CSV lines by "/" python

1

I need to make some changes to my csv. I seek to separate the rows that I have in country by the "/" and duplicate them. Basically I explain a little because I need to do this: When I have a "/" in the row of Country it means that they are both countries and for technical reasons I need to have only one country per row.

example:

*,CA - Canada/UK - United Kingdom,0.45
output:
*,CA - Canada,0.45
*,UK - United Kingdom,0.45

Data:

Space,Country,Price
*,AR - Argentina,0.3
*,CA - Canada/UK - United Kingdom,0.45
*,CL - Chile/PE - Peru,0.25
*,CO - Colombia/EC - Ecuador,0.15
*,CR - Costa Rica/ES - Spain,0.2
*,DE - Germany,0.5
*,MX - Mexico,0.4
*,US - United States,0.8

What I'm looking for:

Space,Country,Price
*,AR - Argentina,0.3
*,CA - Canada,0.45
*,UK - United Kingdom,0.45
*,CL - Chile,0.25
*,PE - Peru,0.25
*,CO - Colombia,0.15
*,EC - Ecuador,0.15
*,CR - Costa Rica,0.2
*,ES - Spain,0.2
*,DE - Germany,0.5
*,MX - Mexico,0.4
*,US - United States,0.8

Code:

import csv
with open("pre.csv", 'rb') as f:
    reader = csv.reader(f)
    your_list = list(reader)
    print your_list

divididos = []
for item2 in your_list:
    if "/" in item2[1]:
        a = item2[1].split('/')
        print item2[0] + ',' + a[0] + ',' + item2[2] + ',' + item2[3]
        print item2[0] + ',' + a[1] + ',' + item2[2] + ',' + item2[3]
    else:
        print item2
    
asked by Martin Bouhier 20.03.2018 в 21:54
source

3 answers

1

Here is the answer!

import csv
import itertools

with open("pre.csv", 'rb') as f:
    reader = csv.reader(f)
    your_list = list(reader)

divididos = []
primeros = []
for item2 in your_list:
    if "/" in item2[1]:
        a = item2[1].split('/')
        p1 = item2[0] + ',' + a[0] + ',' + item2[2] + ',' + item2[3]
        p2 = item2[0] + ',' + a[1] + ',' + item2[2] + ',' + item2[3]
        p1 = p1.split(',')
        p2 = p2.split(',')
        divididos.append(p1)
        divididos.append(p2)
    else:
        primeros.append(item2)

separados_ok = divididos + primeros

with open("output.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(separados_ok)
    
answered by 21.03.2018 в 00:01
1

Following the same logic as in your autoresponder (iterate over rows and use str.split ) we can do the same thing but avoid creating intermediate lists using iterators and modifying logic a bit. This is especially important if your csv is relatively large to preserve memory and reduce the processing time of the csv. In addition, the code is greatly simplified by eliminating many intermediate steps that are inefficient such as chain concatenation:

import csv

with open("pre.csv", 'rb') as in_f, open("output.csv", "wb") as out_f:
    reader = csv.reader(in_f)
    writer = csv.writer(out_f)
    writer.writerow(next(reader)) # Escribimos la cabecera
    writer.writerows(((space, div, price) for space, country, price in reader
                                              for div in country.split("/")))

In principle this must be more efficient both in memory usage and in execution time than any alternative in Pandas or similar to avoid the parsing of the data and the construction integrates the DF in memory and possible complete or partial copies thereof. during processing. Another thing is that your goal is not simply to create another CSV, but to work with the data in Pandas, in which case you should assess which option is better.

    
answered by 21.03.2018 в 00:14
0

Now that a response has already appeared (and the author of the question!) with a solution that we could call "classical", through a loop for iterating through the lines, I'll put another solution much more "esoteric" that makes use of the functions of pandas, adapted from the responses to this similar question in Stack Overflow .

This solution is much more illegible, and therefore less advisable, except for one important reason. When using primitive pandas instead of python code, it is much faster, although this will only be appreciated if the csv to be processed is very large.

With the following dataframe as input:

>>> df = pandas.read_csv("pre.csv")
>>> df
  Space                          Country  Price
0     *                   AR - Argentina   0.30
1     *  CA - Canada/UK - United Kingdom   0.45
2     *             CL - Chile/PE - Peru   0.25
3     *       CO - Colombia/EC - Ecuador   0.15
4     *       CR - Costa Rica/ES - Spain   0.20
5     *                     DE - Germany   0.50
6     *                      MX - Mexico   0.40
7     *               US - United States   0.80

The following spell:

aux=df.Country.str.split("/").apply(pd.Series, 1).stack()
aux.index = aux.index.droplevel(-1)
aux.name = "Country"
del df["Country"]
result = df.join(aux)[["Space", "Country", "Price"]]

produces the desired result:

>>> result
  Space              Country  Price
0     *       AR - Argentina   0.30
1     *          CA - Canada   0.45
1     *  UK - United Kingdom   0.45
2     *           CL - Chile   0.25
2     *            PE - Peru   0.25
3     *        CO - Colombia   0.15
3     *         EC - Ecuador   0.15
4     *      CR - Costa Rica   0.20
4     *           ES - Spain   0.20
5     *         DE - Germany   0.50
6     *          MX - Mexico   0.40
7     *   US - United States   0.80

So it's enough now to do result.to_csv("output.csv")

    
answered by 21.03.2018 в 00:11