Introduction
Pandas iterating is not usually the best solution. Most of the times you want to iterate is because you want to be doing some kind of calculation with the contents of the rows. Pandas implements a large number of "vectorized" calculations, which means that you perform them in a single line of code, taking into account many rows at a time, without having to implement your loop (it is Pandas who internally makes the iterations, usually delegating in numpy and in native code in C, much faster and more efficient than doing it in Python).
For example, to obtain the sum of the column "volume" of the dataframe df
, you could think about iterating through all the rows and getting the value of that column to accumulate it, something like this:
suma = 0
for v in df.volume
suma += v
But it's much shorter and more efficient:
suma = df.volume.sum()
Let's not say whether you want to calculate the average value of all columns. The solution with a loop would require maintaining an "accumulator" variable for each column, probably the use of df.iterrows()
, etc. while the pandas solution is simply df.mean()
.
If instead of wanting to act on all the rows you want only a subset of them, you can apply df.loc
or df.iloc
and put in an expression in square brackets the range of values of the index (for the first case) or row numbers (for the second case).
For example, since in your case the index is of type datetime
, you can do something like the following to operate only with the data corresponding to January:
df.loc["2018-01":"2018-02"]
and on that selection do any operation, such as .sum()
, .mean()
, etc.
Or act only on the first 50 rows with:
df.iloc[0:50]
In your case
You ask to take the value every two rows. At .iloc[]
you spend a normal slice python, so you can use the syntax [inicio:fin:paso]
and put any desired value in the step. So, for example, the following would select all even rows:
df.iloc[::2]
and the following all odd:
df.iloc[1::2]
Example:
>>> print(df.iloc[::2].agg(["count", "mean"]))
open high low close volume
count 150.000000 150.000000 150.000000 150.000000 150.000000
mean -0.248744 -0.086711 0.021593 0.024451 0.157441