Replace values from a list in an array in Python

2

My problem is this, I want to replace a list of questions in the first row of a square null matrix leaving the first zero, that is, if the matrix is m rows by n columns, I would like to insert the list of questions to from the value [m, n] = [0,1]. Where each question in the list occupies a place in that row of the matrix. Likewise, I would like to replace those same questions in the first column, leaving the first value as zero, starting with [m, n] = [1,0]. The code that I am using at the moment is the following:

numero_preguntas = (len(df1_number_rf.columns))
preguntas = np.array(df1_number_rf.columns.values)
matriz = np.zeros((numero_preguntas+1,numero_preguntas+1),)

I mean, I'd like this:

[[0,0,0,0]
 [0,0,0,0]
 [0,0,0,0]
 [0,0,0,0]]

Based on a list of questions:

preguntas = [pregunta1,pregunta2,pregunta3]

Be transformed into this:

[[0,pregunta1,pregunta2,pregunta3]
 [pregunta1,0,0,0]
 [pregunta2,0,0,0]
 [pregunta3,0,0,0]]

Later I have values in a matrix that I want to insert in specific values of the previous one. For example:

[[pregunta1,0.8]
 [pregunta3,0.2]]

I would like to insert them in the column of question 2 in the rows that correspond, in such a way that it looks like this:

[[0,pregunta1,pregunta2,pregunta3]
 [pregunta1,0,0.8,0]
 [pregunta2,0,0,0]
 [pregunta3,0,0.2,0]]

This last thing I need to do because I am trying to elaborate a matrix with the values of the importance obtained using random forest of each one of the questions. That is, question 2 would be the variable to be classified in a decision tree and questions 1 and 3 would become the classifiers.

    
asked by Agu 1997 19.05.2018 в 06:15
source

1 answer

2

You do not define what is the data type of preguntas , I guess they are strings. In an array of Numpy you can not mix data types without more, to do this you need to use a structured array ( numpy.recarray ).

However it is much simpler to use Pandas for this and since you use the tag I will respond in this sense:

import pandas as pd

numero_preguntas = 3
preguntas = ["pregunta1", "pregunta2", "pregunta3"]

matriz = pd.DataFrame(0.0, index=preguntas, columns=preguntas)

With this we obtain our DataFrame with the index and columns (both with the values of columnas ) and full of zeros:

>>> matriz

           pregunta1  pregunta2  pregunta3
pregunta1        0.0        0.0        0.0
pregunta2        0.0        0.0        0.0
pregunta3        0.0        0.0        0.0

To update a certain column you can use the update method of the series:

>>> values = [["pregunta1", 0.8], ["pregunta3", 0.2]]
>>> matriz["pregunta2"].update(pd.Series(dict(values)))

>>> matriz
           pregunta1  pregunta2  pregunta3
pregunta1        0.0        0.8        0.0
pregunta2        0.0        0.0        0.0
pregunta3        0.0        0.2        0.0

Depending on where the data comes from to fill the matrix, this may be simplified.

    
answered by 19.05.2018 / 11:59
source