Problem with pandas.DataFrame.shift function

4

I have the following dataframe in python:

months = [1,2,3,4,5,6,7,8,9,10,11,12]
data1 = [100,200,300,400,500,600,700,800,900,1000,1100,1200]
df = pd.DataFrame({
                    'month' : months,
                    'd1' : data1,
                    'd2' : 0,
                });

and I need to recalculate column d2, as follows:

    month    d1      d2
0       1   100   101.0
1       2   200   303.0
2       3   300   606.0
3       4   400  1010.0
4       5   500  1515.0
5       6   600  2121.0
6       7   700  2828.0
7       8   800  3636.0
8       9   900  4545.0
9      10  1000  5555.0
10     11  1100  6666.0
11     12  1200  7878.0

I'm doing it in the following way:

df['d2'] = (df['d2'].shift(1) + df['d1']) + df['month']

but the result is not what was expected:

    month    d1      d2
0       1   100     NaN
1       2   200   202.0
2       3   300   303.0
3       4   400   404.0
4       5   500   505.0
5       6   600   606.0
6       7   700   707.0
7       8   800   808.0
8       9   900   909.0
9      10  1000  1010.0
10     11  1100  1111.0
11     12  1200  1212.0

I do not know if I'm clear on my application, I thank whoever can help me.

    
asked by Yeison H. Arias 27.09.2018 в 01:18
source

1 answer

1

Keep in mind that when you do:

df['d2'] = df['d2'].shift(1) + df['d1']

What happens is that df['d2'].shift(1) + df['d1'] is calculated which returns a new series that is reassigned to df["d2"] upon completion. That is, the operation is not modifying the column d2 as it is done, but every new generated value is used to create a new Serie .

This implies that it will always be 0 + df["d1"] , since initially the column d2 has only zeros.

You can simply use pandas.Series.cumsum :

import pandas as pd


months = [1,2,3,4,5,6,7,8,9,10,11,12]
data1 = [100,200,300,400,500,600,700,800,900,1000,1100,1200]
df = pd.DataFrame({
                    'month' : months,
                    'd1' : data1,
                    'd2' : 0,
                });

df['d2'] = df.d1.cumsum()
>>> df
    month    d1    d2
0       1   100   100
1       2   200   300
2       3   300   600
3       4   400  1000
4       5   500  1500
5       6   600  2100
6       7   700  2800
7       8   800  3600
8       9   900  4500
9      10  1000  5500
10     11  1100  6600
11     12  1200  7800

Edit

If you want to also include the month column in the accumulated sum, simply make this sum before using the cumsum method:

>>> df['d2'] = (df.d1 + df.month).cumsum()
>>> df
    month    d1    d2
0       1   100   101
1       2   200   303
2       3   300   606
3       4   400  1010
4       5   500  1515
5       6   600  2121
6       7   700  2828
7       8   800  3636
8       9   900  4545
9      10  1000  5555
10     11  1100  6666
11     12  1200  7878
    
answered by 27.09.2018 / 01:23
source