Save data from a csv to a list in python

1

I have a .csv file and from this I need to operate with the last two columns (they are digits) and without occupying "pandas", I have tried it in several ways but I do not succeed ..

These are my .csv data

id     id2     id3    id4
1        0    -180    -90
2        0    -180  -89.5
3        0    -180    -89

This is what has occurred to me

import pprint;

with open("Test.csv", "r") as f:
    list_lines = f.readlines()   

    for line in list_lines:
       pprint.pprint(line.rstrip('\n').split(','))
    
asked by David M FL 26.09.2018 в 03:16
source

2 answers

1

If you want to get a list of the rows with only the last two columns you have several options:

  • Using the csv module:

    In this case, only one slicing of each row is necessary:

    import csv
    
    
    with open("test.csv", "r", newline='') as f: 
        reader = csv.reader(f, delimiter=",")
        next(reader) # Eliminar cabecera
        res = [[float(n) for n in line[-2:]] for line in reader] 
    
    print(res)
    

    Python allows you to use negative indexes, in this case linea[-2:] makes a slicing of the list from the second element starting at the end to the last element, that is, the last ones elements of the list.

  • Without csv module:

    In this case the idea is the same more or less, but we will use str.rsplit :

    with open("test.csv", "r", newline='') as f: 
        next(f) # Eliminar cabecera
        res = [[float(n) for n in line.rstrip().rsplit(",", maxsplit=2)[-2:]] for line in f]
    
    print(res)
    

The result in any case is:

[[-180.0, -90.0],
 [-180.0, -89.5],
 [-180.0, -89.0]]

Avoid using readlines unless it is needed if or if a complete list with all the lines of the csv, for example when you need to sort the rows. The reason is that this completely loads the file in memory, which is inefficient and in many cases totally unnecessary.

Edit

To obtain each value of each row, it is most efficient to iterate over the list of lists with a for in :

import csv


with open("test.csv", "r", newline='') as f: 
    reader = csv.reader(f, delimiter=",")
    next(reader) # Eliminar cabecera
    res = [[float(n) for n in line[-2:]] for line in reader] 

for a, b in res:
    print(a, b)

Indexing can be used, but it is more inefficient and it is not the "pithonic" way of doing it:

for i in range(len(res)):
    a = res[i][0]
    b = res[i][1]
    # O directamente a, b = res[i]
    print(a, b)

If you want to obtain the values separately in your own for just do:

import csv


with open("test.csv", "r", newline='') as f: 
    reader = csv.reader(f, delimiter=",")
    next(reader) # Eliminar cabecera
    for line in reader: 
        DatoLoP, DatoLap = float(line[-2]), float(line[-1])
        print(DatoLoP, DatoLap)

If you want each column in a list you can use zip :

import csv

with open("test.csv", "r", newline='') as f: 
    next(f) # Eliminar cabecera
    a, b = zip(*((float(n) for n in line.rstrip().rsplit(",", maxsplit=2)[-2:])
                    for line in f))

print(a)
print(b)
    
answered by 26.09.2018 в 03:51
0

If you would like to use a method that does not use any library, you can do something like:

# Cargar el archivo:
list_lines =  open('Test.csv').read().split('\n')

# Realizar transformaciones:
list_lines = map(lambda x:x.split(','),list_lines)
list_lines = map(lambda x:x[2:4],list_lines)
list_lines = filter(lambda x:len(x)>0,list_lines)[1:]
list_lines = map(lambda x:[float(x[0]),float(x[1])],list_lines)

In case you are using python 3.x the last two lines should be:

list_lines = list(filter(lambda x:len(x)>0,list_lines))[1:]
list_lines = list(map(lambda x:[float(x[0]),float(x[1])],list_lines))

I hope you find it useful.

    
answered by 27.09.2018 в 05:45