How to convert data from a CSV file into python 3?

1

I am working on Python 3, when reading a .CSV file with several columns, of which the last 3 have numbers. When reading the file, all columns read them as strings and I want them to be int or float.

How can I convert columns 5, 6 and 7 to integers or floating numbers?

Thanks

my code is as follows:

dato_produc = 'produccion_diaria2.csv' #nombre del archivo a leer
 with open (dato_produc, 'r') as produccion:
      produccion = produccion.read().splitlines() #divide los registros en lineas

datos_produccion = []
for l in produccion:
    line = l.split(',') # separa cada dato con una coma
    datos_produccion.append([line[1], line[2], line[3], line[4], line[5], line[6], line[7], line[8]]) #Crea una lista con los datos obtenidos del archivo CSV
 datos_produccion.remove(['indice_planta', 'fecha', 'linea', 'turno', 'supervisor', 'lbs_totales', 'IngUtil', 'merma']) # elimina la primera fila que contiene los titulos para poder hacer operaciones aritmeticas
 print (datos_produccion) # imprime la lista que se ha creado.
    
asked by Alejandro Gomez 21.10.2017 в 01:51
source

2 answers

0

The casting is done as shown by Patricio Moracho in his response, however, if you are free to use any library and you are going to operate with the data I recommend you consider the use of Pandas . You can install it using pip and apart from making things much easier you will have a big difference in terms of efficiency and possibilities.

An example of produccion_diaria2.csv :

  

index_plant, date, line, shift, supervisor, lbs_totales, IngUtil, shrinkage
  2,2017 / 04 / 01,1,4, A, 1524,45,14
  1,2017 / 05 / 01,1,5, B, 147,75.12
  1,2017 / 05 / 21,1,4, C, 1478,41,14
  2,2017 / 05 / 14,1,4, A, 1457,41,5
  2,2017 / 05 / 04,2,4, D, 1475,14.2

We load the csv into a DataFrame:

import pandas as pd

df = pd.read_csv("produccion_diaria2.csv",  parse_dates=['fecha'])

The data is converted to the appropriate type according to its compatibility automatically, although we can also indicate it explicitly. We specify that the date be explicitly peer to type datetime .

We can see our table:

>>> df
   indice_planta      fecha  linea  turno supervisor  lbs_totales  IngUtil  merma
0              2 2017-04-01      1      4          A         1524       45     14
1              1 2017-05-01      1      5          B          147       75     12
2              1 2017-05-21      1      4          C         1478       41     14
3              2 2017-06-14      1      4          A         1457       41      5
4              2 2017-06-04      2      4          D         1475       14      4

Now we can use all the tools that Pandas puts at our disposal to operate, filter and group data. Some very basic examples:

We only select the rows that are from May 2017:

>>> df1 = df[(df["fecha"].dt.year == 2017) & (df["fecha"].dt.month == 5)]
>>> df1

   indice_planta      fecha  linea  turno supervisor  lbs_totales  IngUtil  merma
1              1 2017-05-01      1      5          B          147       75     12 
2              1 2017-05-21      1      4          C         1478       41     14

We select the rows that have indice_planta with value 2 and supervisor "A":

>>> df2 = df[(df["indice_planta"] == 2) & (df["supervisor"] == "A")]
>>> df2

   indice_planta      fecha  linea  turno supervisor  lbs_totales  IngUtil  merma
0              2 2017-04-01      1      4          A         1524       45     14
3              2 2017-06-14      1      4          A         1457       41      5

Obtain the sum of the column merma for floor 1:

>>> df[df["indice_planta"] == 1]["merma"].sum()
26

Get a new column result of subtracting merma to lbs_totales :

>>> df["lbs_reales"] = df["lbs_totales"] - df["merma"]
>>> df

   indice_planta      fecha  linea  turno supervisor  lbs_totales  IngUtil  merma  lbs_reales
0              2 2017-04-01      1      4          A         1524       45     14        1510  
1              1 2017-05-01      1      5          B          147       75     12         135 
2              1 2017-05-21      1      4          C         1478       41     14        1464
3              2 2017-06-14      1      4          A         1457       41      5        1452 
4              2 2017-06-04      2      4          D         1475       14      2        1473
    
answered by 21.10.2017 / 12:21
source
0

Conversions of these data can be done through:

  • int() to convert a string to an integer
  • float() to convert to a floating point data

For example:

datos_produccion.append([line[1], line[2], line[3], line[4], line[5], int(line[6]), float(line[7]), int(line[8])]) 

Note that the string is valid, that is, it can be converted to the chosen data type, otherwise Python will issue an Exception of type ValueError

    
answered by 21.10.2017 в 03:48