Replace commas with points in a dataframe (comma as decimal separator)

2

I'm new to pandas and I have a question about changing points by commas in Python 2.

The numbers in the csv are presented this way:

As you can see in the image, the floats have different number of decimals.

First I tried to change them from the classic way in Calc and Excel with "search and replace" but this happened:

For the numbers with three decimal places, I did not replace the comma with a period, but I deleted the comma and left it as an integer.

Then try Python with the Pandas library, with these commands:

comas_por_puntos = [float(x.replace(',','.')) for x in bd_df['col_1']]
col_1.convert_objects(convert_numeric = True)

By showing col_1 (converted to DataFrame) with print , it looked good, export the DataFrame to csv but when I opened it I found the same situation as the previous image, that is, there was no point in the numbers with three decimal places.

Also try removing a decimal from those who had four, but the result was the same.

When I added a decimal by hand to those of three if it changed well, but it turns out that there are many rows.

What will this be?

It also occurs to me to go through the columns of the DataFrame and add a decimal to those of three but I do not know how to do that ...

    
asked by Juan Salvador 22.07.2018 в 06:07
source

2 answers

0

If the data is read from a csv with pandas.read_csv and the use of the comma as a decimal separator is consistent throughout the file, what you can do is simply use the decimal parameter that allows you to specify what the character is used as a decimal point:

import io 
import pandas as pd

csv = io.StringIO(
u'''col_1;col_2
7,477;a
7,4848;b
7,4;c
7,5126;d
7,5029;e
5;f''')   




bd_df = pd.read_csv(csv, sep=";", decimal=",")

The use of io.StringIO is just to simulate a .csv and make a reproducible example. In any case, what would be used instead of csv is the path to the file:

bd_df = pd.read_csv("ruta/al/fichero.csv", sep=";", decimal=",")

With what we obtain what is sought:

>>> bd_df
    col_1  col_2
0  7.4770      a
1  7.4848      b
2  7.4000      c
3  7.5126      d
4  7.5029      e
5  5.0000      f


>>> bd_df.dtypes
col_1    float64
col_2     object
dtype: object

If you want to save the csv again using the dot as a decimal separator, it's ok, if you want your columns to have an x number of decimals you can use float_format :

bd_df.to_csv('out.csv', index=False, header=True,
              decimal='.', sep=',', float_format='%.4f')

so we get the following csv file:

  

col_1, col_2
  7.4770, a
  7.4848, b
  7.4000, c
  7.5126, d
  7.5029, e
  5.0000, f

If the above were not possible (for example, it is used in all columns but not in one) you can use pandas.Series.str.replace :

bd_df = pd.read_csv(csv, sep=";")
bd_df['col_1'] = bd_df['col_1'].str.replace(",", ".").astype(float)
  

Note : pandas.DataFrame.convert_objects is marked as "deprecated" from version 0.18, in favor of pandas.DataFrame.to_* (p.e pandas.DataFrame.to_numeric )

    
answered by 22.07.2018 / 10:25
source
0

FJSevilla

It has been very helpful for me to know how to add the parameters when exporting as you indicated them, with that, plus this command:

comas_por_puntos = [float(x.replace(',','.')) for x in bd_df['col_1']]

the problem was solved.

The first alternative was also tested, but I think there is a missing "u" before: '' 'F; S, because that throws an error that I managed to correct looking for the error and it suggested me to put it like that --- > u '' 'F; S, ... But I could not solve the problem with that solution. I was able to make it export to me as I wanted but only for those indicated in the code you wrote:

csv = io.StringIO(
'''F;S
7,477;a
7,4848;b
7,4;c
7,5126;d
7,5029;e
5;f''')  

Because in the csv I have about 1900 rows and it seemed strange to have to add a letter to it. But in the end the solution appreciated thanks to you. Greetings.

    
answered by 23.07.2018 в 04:12