There are several ways to filter rows in Pandas, the simplest thing is to create a mask (Boolean array) using a conditional on the column in question and then filter the rows of the Dataframe with it. Basically it is the procedure that is followed in NumPy to filter arrays. This is known as Boolean indexing .
We can create a small example to see it:
import pandas as pd
from io import StringIO
data = StringIO('''
Nombre,Edad,Sexo,Fecha
Juan,12,M,20/08/2017
Laura,21,F,24/07/2017
Pedro,53,M,13/01/2017
María,17,F,15/03/2017
Luís,19,M,15/07/2017
Miguel,23,M,14/08/2017
''')
df = pd.read_csv(data, header=0, parse_dates = ['Fecha'], dayfirst = True)
We therefore have the following Dataframe:
>>> df
Nombre Edad Sexo Fecha
0 Juan 12 M 2017-08-20
1 Laura 21 F 2017-07-24
2 Pedro 53 M 2017-01-13
3 María 17 F 2017-03-15
4 Luís 19 M 2017-07-15
5 Miguel 23 M 2017-08-14
We can filter using the sexo
column to obtain another Dataframe that only contains the rows corresponding to women simply by doing:
>>> df2 = df[df['Sexo'] == 'F']
>>> df
Nombre Edad Sexo Fecha
1 Laura 21 F 2017-07-24
3 María 17 F 2017-03-15
With df['Sexo'] == 'F'
you simply create a mask that in this case is a series of Pandas and that contains a single Boolean column: [False, True, False, True, False, False]
result of comparing if each value of the column equals 'F'
. We can filter with any other iterable boolean values, for example an array of NumPy, a list, a column of another DataFrame, etc.
Another example, filtering those who are 18 or older:
>>> df2 = df[df['Edad'] >= 18]
>>> df2
Nombre Edad Sexo Fecha
1 Laura 21 F 2017-07-24
2 Pedro 53 M 2017-01-13
4 Luís 19 M 2017-07-15
5 Miguel 23 M 2017-08-14
Edit:
You can filter by dates following the same idea, for example we can filter the row with Fecha
between the current date and 30 days ago:
>>> fecha_limite = pd.datetime.now().date() - pd.Timedelta(days=30)
>>> df2 = df[(df['Fecha'] > fecha_limite)]
>>> df2
Nombre Edad Sexo Fecha
0 Juan 12 M 2017-08-20
1 Laura 21 F 2017-07-24
5 Miguel 23 M 2017-08-14
You can also use several conditions, for example the previous condition but also be a woman:
>>> fecha_limite = pd.datetime.now().date() - pd.Timedelta(days=30)
>>> df2 = df[(df['Fecha'] > fecha_limite) & (df['Sexo'] == 'F')]
>>> df2
Nombre Edad Sexo Fecha
1 Laura 21 F 2017-07-24
You can filter using the index in the same way or using loc
, for example, if our column Fecha
were in the DataFrame index (DateTimeIndex) we can do:
>>> fecha_limite = pd.datetime.now().date() - pd.Timedelta(days=30)
>>> df2 = df[df.index.date > fecha_limite]