How to add a matrix and headings to a DataFrame that already has an index

1

I have an array of results and I would like to put it in a DataFrame who already has an index with the following headings "one, two, three, four, five, six, seven" to be able to export it in csv format:

[[  9.98611569e-01   2.61135280e-01   9.85433698e-01   1.71457045e-02
    9.78044927e-01   2.60528386e-01]
 [  4.88283520e-04   4.29093279e-06   1.85536774e-05   3.19911924e-04
    1.81139112e-04   9.03336913e-05]
 [  1.14567261e-02   7.42107150e-05   6.58972771e-04   6.78182842e-05
    1.78377319e-03   1.35437978e-04]
 ..., 
 [  4.52868361e-03   3.07226946e-05   1.60213807e-04   6.06868671e-05
    1.57363957e-03   2.02276744e-04]
 [  3.99599038e-03   3.06017573e-05   1.73806780e-04   5.36961670e-05
    2.71984679e-03   1.02118785e-02]
 [  9.76500273e-01   2.00048480e-02   7.18341112e-01   3.43492441e-03
    5.39431930e-01   1.23000629e-02]]

Here are the first indexes:

>>> df.head()
                 id
0  00001cee341fdb12
1  0000247867823ef7
2  00013b17ad220c46
3  00017563c3f7919a
4  00017695ad8997eb

In the end we should get:

id, uno, dos, tres, quatro, cinco, seis, siete
00001cee341fdb12,9.98611569e-01,2.61135280e-01,9.85433698e-01,1.71457045e-02,9.78044927e-01,2.60528386e-01
0000247867823ef7,4.88283520e-04,4.29093279e-06,1.85536774e-05,3.19911924e-04,1.81139112e-04,9.03336913e-05

etc ...

We can reduce the number of significant numbers ...

At the moment I did:

df = pd.read_csv('data/test.csv',index_col='id')
df2 = pd.DataFrame(test_result,)
df = df.assign(df2, colums = "id,toxic,severe_toxic,obscene,threat,insult,identity_hate")
    
asked by ThePassenger 10.02.2018 в 16:09
source

1 answer

1

A pandas.DataFrame.assign must be passed an indeterminate number of keyword arguments, forming column-value pairs. Better illustrate it with a simple example:

>>> import pandas as pd

>>> df = pd.DataFrame([{"colA": 1, "colB": 2},
                       {"colA": 2, "colB": 2},
                       {"colA": 3, "colB": 2}])
>>> df
   colA  colB
0     1     2
1     2     2
2     3     2

>>> df.assign(colA=[13, 17, 19], colB=[3, 5, 7], colC=lambda d: d.colA * d.colB)
   colA  colB  colC
0    13     3     2
1    17     5     4
2    19     7     6

In your case, if parts of a NumPy array you can do something like this:

import pandas as pd
import numpy as np



test_result = np.array([[9.98611569e-01, 2.61135280e-01, 9.85433698e-01, 1.71457045e-02, 9.78044927e-01, 2.60528386e-01],
                        [4.88283520e-04, 4.29093279e-06, 1.85536774e-05, 3.19911924e-04, 1.81139112e-04, 9.03336913e-05],
                        [1.14567261e-02, 7.42107150e-05, 6.58972771e-04, 6.78182842e-05, 1.78377319e-03, 1.35437978e-04],
                        [4.52868361e-03, 3.07226946e-05, 1.60213807e-04, 6.06868671e-05, 1.57363957e-03, 2.02276744e-04]])

df = pd.DataFrame([{'id': '00001cee341fdb12'},
                   {'id': '0000247867823ef7'},
                   {'id': '00013b17ad220c46'},
                   {'id': '00017695ad8997eb'}]).set_index('id')

df = df.assign(toxic= test_result[:, 0],
               severe_toxic= test_result[:, 1],
               obscene= test_result[:, 2],
               threat= test_result[:, 3],
               insult= test_result[:, 4],
               identity_hate= test_result[:, 5])

Another option is to create a dictionary with couples nombre_columna: valores :

cols = ('toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate')
df = df.assign(**{key: col for key, col in zip(cols, test_result.T)}) 
    
answered by 10.02.2018 / 23:20
source