Error with pd.DataFrame.replace, can not convert unicode string to float

1

I am trying to multiply a data frame by reference so I am using a dictionary to multiply it. When I try to fill my dictionary with the values in Excel rows I can not convert the values to float since they are unicode. I am using Python 2.7.10, can someone tell me some efficient way to convert from unicode to float?

import pandas as pd
#convertir el archivo de excel en un marco de datos
df = pd.read_excel("C:/Users/Pedro/Desktop/dataframe.xls")

#llenar el diccionario con las llaves y sus valores. este sera el marcos df2
d = {"M1-4":0.60,"M1-5/R10":0.85,"C5-3":0.85,"M1-5/R7-3":0.85,"M1-4/R7A":0.85,"R7A":0.85,"M1-4/R6A":0.85,"M1-4/R6B":0.85,"R6A":0.85,"PARK":0.20,"M1-6/R10":0.85,"R6B":0.85,"R9":0.85,"M1-5/R9":0.85}

#Convertir el diccionario en archivo de Excel
df5 = pd.DataFrame.from_dict(d, orient='index')
df5.to_excel('bob_dict.xlsx')

#llenar el diccionario con un archivo de excel
df2 = pd.read_excel("C:/Users/Pedro/Desktop/bob_dict.xlsx")
#Convertir el marco de datos a un diccionario.
dictionary = df2.to_dict(orient='dict')
#multiplicar el diccionario como referencia.

b = df.filter(like ='Value').values
c = df.filter(like ='ZONE').replace(dictionary).astype(float).values

df['pro_cum'] = ((c * b).sum(axis =1))

when I run the code it gives me the error:

  

ValueError: RAB string could not be converted to a float.

This is the framework with which I work df :

HP    ZONE           Value  ZONE1       Value1
3     R7A           0.7009  M1-4/R6B    0.00128
2     R6A           0.5842  M1-4/R7A    0.00009
7     M1-6/R10      0.1909  M1-4/R6A    0.73576
9     R6B           0.6919  PARK        0.03459
6     PARK          1.0400  M1-4/R6A    0.33002
9.3   M1-4/R6A      0.7878  PARK        0.59700
10.6  M1-4/R6B      0.0291  R6A         0.29621
11.9  R9            0.0084  M1-4        0.00058
13.2  M1-5/R10      0.0049  M1-4        0.65568
14.5  M1-4/R7A      0.0050  C5-3        0.00096
15.8  M1-5/R7-3     0.0189  C5-3        1.59327
17.1  M1-5/R9       0.3296  M1-4/R6B    0.43918
18.4  C5-3          0.5126  R6B         0.20835
19.7  M1-4          0.5126  PARK        0.22404
    
asked by Jose Vasquez 01.07.2018 в 00:50
source

1 answer

1

When you create df5 from the dictionary, when you pass 'index 'to the argument orient , the DataFrame is created with the keys as an index and the values remain as a column (or columns if the key is an iterable one, as a list). If you do not indicate the name of the columns with the argument columns (from Pandas 0.23.0) you will automatically name them as an integer starting at 0:

              0
M1-4       0.60
M1-5/R10   0.85
C5-3       0.85
M1-5/R7-3  0.85
M1-4/R7A   0.85
R7A        0.85
M1-4/R6A   0.85
M1-4/R6B   0.85
R6A        0.85
PARK       0.20
M1-6/R10   0.85
R6B        0.85
R9         0.85
M1-5/R9    0.85

When you bring back the DataFrame and return it to a dictionary by dictionary = df2.to_dict(orient='dict') s and create a dictionary of dictionaries , where the key is the name of the column and the values a dictionary with couples indice: valor para esa columna :

{0: {'M1-4': 0.6, 'M1-5/R10': 0.85, 'C5-3': 0.85,
     'M1-5/R7-3': 0.85, 'M1-4/R7A': 0.85, 'R7A': 0.85,
     'M1-4/R6A': 0.85, 'M1-4/R6B': 0.85, 'R6A': 0.85,
     'PARK': 0.2, 'M1-6/R10': 0.85, 'R6B': 0.85,
     'R9': 0.85, 'M1-5/R9': 0.85
     }
    }

When pandas.DataFrame.replace is passed a dictionary of this type what it does is find the appropriate column (0) of the DataFrame ( df ), then see if each index is in the nested dictionary associated with that column and if it is, it replaces the value.

Since the only dictionary key is 0 and that column does not exist in df , replace does not replace anything, leaving the columns ZONEx as they were, containing strings ("R7A", "M1-4 / R6A ", etc) that can not logically be converted into floats.

Therefore you can simply select the key 0 of dictionary :

dictionary = df2.to_dict()
c = df.filter(like='ZONE').replace(dictionary[0]).astype(float).values

or also use pandas.Series.to_dict on the column:

dictionary = df2[0].to_dict()
c = df.filter(like='ZONE').replace(dictionary).astype(float).values

We must bear in mind that if the value of one of the columns is not found in the dictionary, we will have the same error.

    
answered by 01.07.2018 / 01:58
source