Average cycle with pandas

1

Good I have the following df , the question is that I need to filter the speeds per hour and with these take an average, I am utlizando the following code:

import pandas as pd
import numpy as np
df = pd.read_csv('df.csv')
b = df[(df['Fecha'] >= '2010-04-01') & (df['Fecha'] 
<= '2010-04-30') & (df['Hora'] == '22:00:00')]
c = b['Velocidad (m/s)']
d = np.nanmean(c)

And I would like to know if there is a way to do a for or something similar that runs through all the hours and does not have to be changing them manually.

Thanks in advance.

    
asked by Jose.P 30.04.2018 в 22:50
source

1 answer

1

Be careful when you make comparisons with strings that represent dates as if they were. The comparison is made by lexicographical order, depending on which format has the date we can find unwanted results. Regardless of the above, pairing the date to datetime or Datetimeindex will allow you to use many associated methods, for example grouping by hours, months, etc in a simple way, resampling ...

We are going to start from the following DataFrame, which is an example based on which you sample:

import pandas as pd
from io import StringIO

csv = StringIO('''
Fecha,Hora,Velocidad (m/s)
2010-01-01,00:00:00,3.9
2010-01-01,01:00:00,4.2
2010-01-01,02:00:00,3.5
2010-01-02,00:00:00,2.5
2010-01-02,01:00:00,2.8
2010-01-02,01:00:00,3.4
2010-02-01,00:00:00,2.1
2010-02-01,01:00:00,1.6
2010-02-01,02:00:00,4.2
2010-02-02,00:00:00,2.5
2010-02-02,01:00:00,3.7
2010-02-02,02:00:00,3.2
''')


df = pd.read_csv(csv, dayfirst=True, index_col=0,
                 parse_dates = {'Fecha_': ['Fecha','Hora']}
                 )

We parse columns Fecha and Hora to group both in a DatetimeIndex, the DataFrame we get is:

>>> df
                     Velocidad (m/s)
Fecha_                             
2010-01-01 00:00:00             3.9
2010-01-01 01:00:00             4.2
2010-01-01 02:00:00             3.5
2010-01-02 00:00:00             2.5
2010-01-02 01:00:00             2.8
2010-01-02 01:00:00             3.4
2010-02-01 00:00:00             2.1
2010-02-01 01:00:00             1.6
2010-02-01 02:00:00             4.2
2010-02-02 00:00:00             2.5
2010-02-02 01:00:00             3.7
2010-02-02 02:00:00             3.2

Once we have this, since you want to get the average of each hour per month, just use groupby , group by month and hour and apply mean :

medias = df.groupby([df.index.year,
                     df.index.month,
                     df.index.hour])['Velocidad (m/s)'].mean()

medias.index.names = ["Año", "Mes", "Hora"]

With this we already have a series with the hourly averages of each month:

>>> medias

Año   Mes  Hora
2010  1    0       3.200000
           1       3.466667
           2       3.500000
      2    0       2.300000
           1       2.650000
           2       3.700000
Name: Velocidad (m/s), dtype: float64
    
answered by 01.05.2018 / 01:45
source