Be careful when you make comparisons with strings that represent dates as if they were. The comparison is made by lexicographical order, depending on which format has the date we can find unwanted results. Regardless of the above, pairing the date to datetime
or Datetimeindex
will allow you to use many associated methods, for example grouping by hours, months, etc in a simple way, resampling ...
We are going to start from the following DataFrame, which is an example based on which you sample:
import pandas as pd
from io import StringIO
csv = StringIO('''
Fecha,Hora,Velocidad (m/s)
2010-01-01,00:00:00,3.9
2010-01-01,01:00:00,4.2
2010-01-01,02:00:00,3.5
2010-01-02,00:00:00,2.5
2010-01-02,01:00:00,2.8
2010-01-02,01:00:00,3.4
2010-02-01,00:00:00,2.1
2010-02-01,01:00:00,1.6
2010-02-01,02:00:00,4.2
2010-02-02,00:00:00,2.5
2010-02-02,01:00:00,3.7
2010-02-02,02:00:00,3.2
''')
df = pd.read_csv(csv, dayfirst=True, index_col=0,
parse_dates = {'Fecha_': ['Fecha','Hora']}
)
We parse columns Fecha
and Hora
to group both in a DatetimeIndex, the DataFrame we get is:
>>> df
Velocidad (m/s)
Fecha_
2010-01-01 00:00:00 3.9
2010-01-01 01:00:00 4.2
2010-01-01 02:00:00 3.5
2010-01-02 00:00:00 2.5
2010-01-02 01:00:00 2.8
2010-01-02 01:00:00 3.4
2010-02-01 00:00:00 2.1
2010-02-01 01:00:00 1.6
2010-02-01 02:00:00 4.2
2010-02-02 00:00:00 2.5
2010-02-02 01:00:00 3.7
2010-02-02 02:00:00 3.2
Once we have this, since you want to get the average of each hour per month, just use groupby
, group by month and hour and apply mean
:
medias = df.groupby([df.index.year,
df.index.month,
df.index.hour])['Velocidad (m/s)'].mean()
medias.index.names = ["Año", "Mes", "Hora"]
With this we already have a series with the hourly averages of each month:
>>> medias
Año Mes Hora
2010 1 0 3.200000
1 3.466667
2 3.500000
2 0 2.300000
1 2.650000
2 3.700000
Name: Velocidad (m/s), dtype: float64