Index group values in pandas (Python)

0

I would like to obtain the values separately from a dataframe that comes from a previous grouping:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

mu, sigma = 0, 0.2
x = np.abs(np.random.normal(mu, sigma, 1000))
y = np.abs(np.random.normal(mu, sigma, 1000))
component = np.random.choice(5, 1000)

p_id = np.linspace(1,1000,1000)

x_bins = np.linspace(0,1,11) # especificados por usuario


data =  pd.DataFrame({'x': x,
                     'y': y,
                     'component':component},
                    index=p_id)
data['x_groups'] = pd.cut(data['x'], bins=x_bins, include_lowest=False )
data = data.sort_values('x_groups')
ydata_xgrouped = data.groupby(['x_groups','component'])['y']
ydata_xgrouped.sum()

But I can not find the appropriate indexing to get, for example:

ydata_xgrouped["component"==0].sum() ??
# [14.28, 8.42, 5.13, ...]

Or an array with the results for each component:

#[
#[14.28, 8.42, 5.13, ...],
#[10.92, 7.77, 4.93, ...],
#[...],
#]
    
asked by FZNB 06.11.2017 в 12:44
source

1 answer

1

If we start from the series result of applying sum we can obtain the values in multiple ways, for example starting from:

>>> ydata_xgrouped = data.groupby(['x_groups','component'])['y'].sum()
>>> ydata_xgrouped

x_groups    component
(0.0, 0.1]  0            11.797269
            1             7.866785
            2            10.990977
            3            12.623392
            4            11.969696
(0.1, 0.2]  0            10.079543
            1            11.685945
            2            10.880716
            3             9.744925
            4            10.067830
(0.2, 0.3]  0             5.848233
            1             8.118861
            2             5.932659
            3             4.918642
            4             6.046451
(0.3, 0.4]  0             1.169767
            1             1.768517
            2             3.575008
            3             2.932276
            4             3.977678
(0.4, 0.5]  0             1.353690
            1             0.024648
            2             1.642924
            3             1.645119
            4             0.668020
(0.5, 0.6]  0             0.218703
            2             0.194925
            3             0.287697
(0.6, 0.7]  0             0.419558
            2             0.258219
Name: y, dtype: float64

We can do:

>>> ydata_xgrouped[ydata_xgrouped.index.get_level_values('component').isin([0])]

x_groups    component
(0.0, 0.1]  0            11.797269
(0.1, 0.2]  0            10.079543
(0.2, 0.3]  0             5.848233
(0.3, 0.4]  0             1.169767
(0.4, 0.5]  0             1.353690
(0.5, 0.6]  0             0.218703
(0.6, 0.7]  0             0.419558
Name: y, dtype: float64

Another very concise but less readable option is to use loc and NumPy's own syntax for slicing:

>>> ydata_xgrouped.loc[:, :0]

x_groups    component
(0.0, 0.1]  0            11.797269
(0.1, 0.2]  0            10.079543
(0.2, 0.3]  0             5.848233
(0.3, 0.4]  0             1.169767
(0.4, 0.5]  0             1.353690
(0.5, 0.6]  0             0.218703
(0.6, 0.7]  0             0.419558
Name: y, dtype: float64

If you want to obtain it in the form of an array, just use the attribute values :

>>> ydata_xgrouped.loc[:, :0].values
    array([ 11.79726887,  10.07954334,   5.84823291,   1.16976735,
             1.3536904 ,   0.21870311,   0.41955825])

If you want to obtain all the values based on the "component" index in the form of array 2d you can use the unstack method:

>>> ydata_xgrouped.unstack(level=0).values
array([[ 11.79726887,  10.07954334,   5.84823291,   1.16976735,
          1.3536904 ,   0.21870311,   0.41955825],
       [  7.86678528,  11.68594524,   8.11886136,   1.76851697,
          0.0246483 ,          nan,          nan],
       [ 10.99097695,  10.88071643,   5.9326589 ,   3.57500767,
          1.64292406,   0.19492456,   0.25821945],
       [ 12.62339194,   9.74492492,   4.91864152,   2.93227576,
          1.64511856,   0.2876966 ,          nan],
       [ 11.96969588,  10.06782955,   6.0464511 ,   3.97767831,
          0.66801981,          nan,          nan]])
>>> 
    
answered by 06.11.2017 в 15:00