Problem when pivoting a dataframe from two concatenated dataframes

2

Use Python 3.5 and Pandas 0.20.3

I get an error when I try to pivot a panda dataframe which I obtained using the concat function.

I detail the process.

This is my first dataframe.

df = pd.DataFrame([
['2017-01-03 21:00:00','2017-01-03 21:00:00','2017-01-04 21:00:00','2017-01-04 21:00:00',
 '2017-01-05 21:00:00','2017-01-05 21:00:00'],
['RUT','RUT','RUT','RUT','RUT','RUT'],[65.00,26.00,-47.00,-8.00,32.00,10.00],
[59.09,20.21,-53.12,-13.96,26.45,4.25]]).T   df.columns=[['Fecha_Hora','Ticker_Suby','Rtdo_Bruto_x_Estrat','Rtdo_Neto_x_Estrat']] df = df.sort_values(by=['Ticker_Suby','Fecha_Hora',], ascending=True) 

So, what do I get:

  

Then I group and add rows by TickerSuby and FechaHora .

a = df.groupby(by=['Ticker_Suby','Fecha_Hora',]).sum() a.head(100)
  

Then I use the function concat to join the dataframe 'a' with the dataframe 'df' and calculate the cumulative sums of the columns 'Rtdo_Bruto_x_Estrat' and 'Rtdo_Neto_x_Estrat' .

df = pd.concat([a.groupby(level=[1]).cumsum(),a.groupby(level=[0]).cumsum().add_suffix('_cum')], 1) df.reset_index(drop=False)
  

Well, my problem is that I try to pivot this dataframe df using the column ' Fecha_Hora' as index but I get the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\Users\Angel\Anaconda44\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2392         uniques : index
-> 2393         """
   2394         if self.is_unique and not dropna:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)()

KeyError: 'Fecha_Hora'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
 in ()
----> 1 df1 = a.pivot(index="Fecha_Hora",columns ="Ticker_Suby",values = "Rtdo_Bruto_x_Estrat")
      2 df1.head()
      3 

C:\Users\Angel\Anaconda44\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values)
   3949         Returns
   3950         -------
-> 3951         unstacked : DataFrame or Series
   3952         """
   3953         from pandas.core.reshape.reshape import unstack

C:\Users\Angel\Anaconda44\lib\site-packages\pandas\core\reshape\reshape.py in pivot(self, index, columns, values)
    373             index = self.index
    374         else:
--> 375             index = self[index]
    376         indexed = Series(self[values].values,
    377                          index=MultiIndex.from_arrays([index, self[columns]]))

C:\Users\Angel\Anaconda44\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2060         -------
   2061         q : DataFrame
-> 2062 
   2063         Notes
   2064         -----

C:\Users\Angel\Anaconda44\lib\site-packages\pandas\core\frame.py in _getitem_column(self, key)
   2067         multidimensional key (e.g., a DataFrame) then the result will be passed
   2068         to :meth:'DataFrame.__getitem__'.
-> 2069 
   2070         This method uses the top-level :func:'pandas.eval' function to
   2071         evaluate the passed query.

C:\Users\Angel\Anaconda44\lib\site-packages\pandas\core\generic.py in _get_item_cache(self, item)
   1532 
   1533         multicolumn : boolean, default True
-> 1534             Use \multicolumn to enhance MultiIndex columns.
   1535             The default will be read from the config module.
   1536 

C:\Users\Angel\Anaconda44\lib\site-packages\pandas\core\internals.py in get(self, item, fastpath)
   3588 
   3589             if not isnull(item):
-> 3590                 loc = self.items.get_loc(item)
   3591             else:
   3592                 indexer = np.arange(len(self.items))[isnull(self.items)]

C:\Users\Angel\Anaconda44\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2393         """
   2394         if self.is_unique and not dropna:
-> 2395             return self
   2396 
   2397         values = self.values

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)()

KeyError: 'Fecha_Hora'

----------------------

I have also tried to apply something simpler like:

a.pivot_table(df,index=["Fecha_Hora",'Rtdo_Bruto_x_Estrat_cum'])

But I get:

KeyError                                  Traceback (most recent call last)
 in ()
      1 #df1 = a.pivot(index="Fecha_Hora",columns ="Ticker_Suby",values = "Rtdo_Bruto_x_Estrat")
      2 #df1.head()
----> 3 a.pivot_table(df,index=["Fecha_Hora",'Rtdo_Bruto_x_Estrat_cum'])

C:\Users\Angel\Anaconda44\lib\site-packages\pandas\core\reshape\pivot.py in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name)
    110         for i in values:
    111             if i not in data:
--> 112                 raise KeyError(i)
    113 
    114         to_filter = []

KeyError: 'Rtdo_Bruto_x_Estrat_cum'

------------------------------

I have searched the net for what could be wrong but I have not found anything.

Something I have noticed may be related is the dataframe df that I get when applying concat executed in the IDE Pycharm. As you can see in the image, the date field appears as "deactivated" or "non-operative".

    
asked by Angel Gálvez 05.11.2017 в 10:51
source

0 answers