# Problem with pandas.DataFrame.shift function

4

I have the following dataframe in python:

``````months = [1,2,3,4,5,6,7,8,9,10,11,12]
data1 = [100,200,300,400,500,600,700,800,900,1000,1100,1200]
df = pd.DataFrame({
'month' : months,
'd1' : data1,
'd2' : 0,
});
``````

and I need to recalculate column d2, as follows:

```    month    d1      d2
0       1   100   101.0
1       2   200   303.0
2       3   300   606.0
3       4   400  1010.0
4       5   500  1515.0
5       6   600  2121.0
6       7   700  2828.0
7       8   800  3636.0
8       9   900  4545.0
9      10  1000  5555.0
10     11  1100  6666.0
11     12  1200  7878.0

```

I'm doing it in the following way:

``````df['d2'] = (df['d2'].shift(1) + df['d1']) + df['month']
``````

but the result is not what was expected:

```    month    d1      d2
0       1   100     NaN
1       2   200   202.0
2       3   300   303.0
3       4   400   404.0
4       5   500   505.0
5       6   600   606.0
6       7   700   707.0
7       8   800   808.0
8       9   900   909.0
9      10  1000  1010.0
10     11  1100  1111.0
11     12  1200  1212.0
```

I do not know if I'm clear on my application, I thank whoever can help me.

asked by Yeison H. Arias 27.09.2018 в 01:18
source

1

Keep in mind that when you do:

``````df['d2'] = df['d2'].shift(1) + df['d1']
``````

What happens is that `df['d2'].shift(1) + df['d1']` is calculated which returns a new series that is reassigned to `df["d2"]` upon completion. That is, the operation is not modifying the column `d2` as it is done, but every new generated value is used to create a new `Serie` .

This implies that it will always be `0 + df["d1"]` , since initially the column `d2` has only zeros.

You can simply use `pandas.Series.cumsum` :

``````import pandas as pd

months = [1,2,3,4,5,6,7,8,9,10,11,12]
data1 = [100,200,300,400,500,600,700,800,900,1000,1100,1200]
df = pd.DataFrame({
'month' : months,
'd1' : data1,
'd2' : 0,
});

df['d2'] = df.d1.cumsum()
``````
``````>>> df
month    d1    d2
0       1   100   100
1       2   200   300
2       3   300   600
3       4   400  1000
4       5   500  1500
5       6   600  2100
6       7   700  2800
7       8   800  3600
8       9   900  4500
9      10  1000  5500
10     11  1100  6600
11     12  1200  7800
``````

## Edit

If you want to also include the `month` column in the accumulated sum, simply make this sum before using the `cumsum` method:

``````>>> df['d2'] = (df.d1 + df.month).cumsum()
>>> df
month    d1    d2
0       1   100   101
1       2   200   303
2       3   300   606
3       4   400  1010
4       5   500  1515
5       6   600  2121
6       7   700  2828
7       8   800  3636
8       9   900  4545
9      10  1000  5555
10     11  1100  6666
11     12  1200  7878
``````