Python copying a column from a .txt to a .csv, problems with decimals

1

I'm creating a numpy array called a table, in the fifth column I want to save a list of numbers that I have in a .txt file

I do it with the following code:

tabla[:, 4] = np.fromfile('/home/lucia/Documentos/Base de datos de imagenes/TID2013/mos.txt', sep='\n')

in the file most.txt there is a list of 3000 numbers, one below the other, that is organized as a column, the numbers vary from 0 to 9 and have 5 decimal digits, the list looks something like this:

5.51429
5.56757
4.94444
4.37838
3.86486
.
.
.

Once I have my complete table (the other columns are not interesting now), I want to save it in a .csv file (comma separated values), I do it with the following code:

a=tabla.tolist()

with open('tabla.csv', 'w') as csvfile:  
    writer = csv.writer(csvfile)
    writer.writerow(headers)
    writer.writerows(a)

The problem is that the numbers that have, the whole part other than 0, the last two decimal digits equal to 0 and the third other than 0 saves them without the point, so for example, to 3.92500 saves it as 3925 to 6.12500 saves it as 6125, that is, it seems that it takes the point as if they were thousands and not decimals ...

Does anyone have any idea why this happens and how can I solve it?

    
asked by Lucy_in_the_sky_with_diamonds 09.06.2017 в 22:00
source

3 answers

1

If all the numbers have 5 decimals, you could format it as follows:

a = tuple([format(x, '.5f') for x in y] for y in tabla.tolist())

with open('tabla.csv', 'w') as csvfile:  
    writer = csv.writer(csvfile)
    writer.writerow(headers)
    writer.writerows(a)

That way we force the format to 5 decimals '.5f' using a generator expression to create a new tuple.

    
answered by 09.06.2017 / 22:23
source
1

The csv generated is correct , the code works as it should, in fact it is not necessary to pass even a list of data, it accepts the array as such ( writer.writerows(tabla) ). The problem is not Python but the program or method you use to open the generated csv that is interpreting the point as thousands separator . Open it with a plain text editor and you will realize this.

To solve it depends on the program / method you use to open the csv, in the case of LibreOffice it is enough to select "English (USA)" as an import language to interpret decimals perfectly.

Simulating your code:

import csv
import numpy as np

#Simulamos un array acorde a tu ejemplo
tabla = np.zeros((7, 5))
tabla[:, 4] = [5.51429, 5.56757, 4.94444, 4.37838, 3.86486, 3.92500, 6.12500]


headers = ('Col1','Col2','Col3','Col4','Col5')

with open('tabla.csv', 'w') as csvfile:  
    writer = csv.writer(csvfile)
    writer.writerow(headers)
    writer.writerows(tabla)

The real output is this:

As you can see there is no problem, the problem is in the program that you use to open the csv and how you interpret it.

Since you use NumPy you can still use it to save the csv using numpy.savetxt if you wish:

import numpy as np

#Simulamos un array acorde a tu ejemplo
tabla = np.zeros((7, 5))
tabla[:, 4] = [5.51429, 5.56757, 4.94444, 4.37838, 3.86486, 3.92500, 6.12500]


headers = 'Col1,Col2,Col3,Col4,Col5'
formato = ['%.1f', '%.1f', '%.1f', '%.1f', '%.5f']
with open ("tabla.csv", 'wb') as csvfile:
    np.savetxt(csvfile, tabla, delimiter=",", fmt=formato, newline='\r\n', header=headers, comments='')

Exit:

You can play with the formats to obtain the output you want, in this case all the columns have a decimal in the csv minus the 5 (index 4) that has 5 decimals.

    
answered by 09.06.2017 в 23:11
0

The main problem I see is np.fromfile() does not help you read a CSV file but for a file previously saved by data.tofile() . If you are going to use numpy for reading, I recommend you do something like this:

from numpy import genfromtxt
my_data = genfromtxt('my_file.csv', dtype=float, delimiter=',')
    
answered by 09.06.2017 в 23:02