Error: Invalid index to scalar value

1

This is my workflow. I try to filter my data y and z to eliminate the nan values. I can not fix the Invalid index error.

data = logs[['DEPT','NPHI', 'RT' ]]
x = data ['DEPT']
y = data ['NPHI']
z = data['RT']

DEPT_inv = sp.sum(sp.isnan(x))
NPHI_inv = sp.sum(sp.isnan(y))
RT_inv = sp.sum(sp.isnan(z))

nan_array_DEPT = sp.isnan(x)
nan_array_NPHI = sp.isnan(y)
nan_array_RT = sp.isnan(z)

mask_NPHI = ~nan_array_NPHI
mask_RT = ~nan_array_RT
NPHI_filtered = NPHI_inv[mask_NPHI]
Error: Invalid index to scalar value 
    
asked by Iris Leyva 30.06.2017 в 16:48
source

1 answer

1

So it seems data is a DataFrame of Pandas so x , y , and z are columns ( pandas.core.series.Series ). I guess in the end you want to get an array without the values NaN .

Keep in mind that:

DEPT_inv = sp.sum(sp.isnan(x))

What it gives us is a integer with the number of elements that are NaN in the x column. Then you try to apply the mask to DEPT_inv , which is not possible because it is not an array, it is an integer ( "Invalid index to scalar value" ).

To get three arrays with only non-zero values you can simply use pandas.dropna() .

import scipy as sp


DEPT_filtered = sp.array(data['DEPT'].dropna())
NPHI_filtered = sp.array(data['NPHI'].dropna())
RT_filtered   = sp.array(data['RT'].dropna())

With this you get three arrays with only the non-zero values of each column.

If you want to use a mask anyway, then you must pass each Series to an array of NumPy first:

import scipy as sp

x = sp.array(data['DEPT'])
y = sp.array(data['NPHI'])
z = sp.array(data['RT'])


DEPT_filtered = x[~sp.isnan(x)]
NPHI_filtered = y[~sp.isnan(y)]
RT_filtered   = z[~sp.isnan(z)]

You do not clearly specify what the final result you want must be, keep in mind that if you want to eliminate rows of data that have all their null values then you can apply dropna on data directly:

data.dropna(axis=0, how='all', inplace = True)
    
answered by 30.06.2017 в 18:57