Change date format ymd_hms to ymd

2

I have a dataframe in Python with two columns: StartTime and StopTime . In one of the operations, I need to group the data per day ( StartTime ). For this I need to eliminate hours, minutes, seconds from that column ...

The data you currently have is this way:

StartTime
2019-09-01 20:47:50
2019-09-01 22:47:50
2019-09-02 20:47:50

And I want you to stay:

StartTime
2019-09-01
2019-09-01
2019-09-02

To finally group and count the number of observations per day. In R it is very simple with as_date() and then grouping by StartTime and adding a column with n() .

I can not find the way to do it with Python and pandas .

    
asked by cluna 22.06.2017 в 12:45
source

2 answers

3

You do not need to delete the hours at all if you do not want to, it is enough to group by date. Based on the DataFrame of your previous question I created the following example:

from io import StringIO
import pandas as pd


datos =datos = StringIO('''\
Target  Observer                 StartTime                  StopTime                                                         
Target1      RT1   2019-09-01 20:47:50.020   2019-09-01 20:57:50.020
Target1      RT1   2019-09-01 20:47:50.020   2019-09-01 20:57:50.020
Target1      RT1   2019-09-01 20:47:50.020   2019-09-01 20:57:50.020
Target1      RT2   2019-09-01 20:47:50.020   2019-09-01 20:57:50.020
Target1      RT2   2019-09-01 20:47:50.020   2019-09-01 20:57:50.020
Target1      RT1   2019-09-02 03:20:10.210   2020-10-15 04:01:48.210
Target1      RT1   2019-09-02 03:20:10.210   2020-10-15 04:01:48.210
Target1      RT2   2019-09-02 03:20:10.210   2020-10-15 04:01:48.210
Target1      RT2   2019-09-02 17:47:13.370   2019-03-14 17:57:21.520
Target1      RT2   2019-09-02 13:15:35.030   2019-12-27 14:57:14.520
''')

df = pd.read_table(datos, sep='\s\s+',engine='python' ,parse_dates=['StartTime','StopTime'])

What you want is to group by observer and by start date counting the number of observations per observer and day. We can use pandas.DataFrame.groupby to group by observer and day and use the agg method to create a new column with the count. The result is a new column with the number of observations that exist for each observer in a day:

res = df.set_index('StartTime').groupby(['Observer', pd.TimeGrouper('D')]).agg({'Observer': 'count'}).rename(columns={'Observer':'Obs'})

Exit:

>>> res
                        Obs
Observer StartTime         
RT1      2019-09-01       3
         2019-09-02       2
RT2      2019-09-01       2
         2019-09-02       3
    
answered by 22.06.2017 / 13:30
source
1

There are two ways to perform the conversion and remove the "time" part of the column.

On the one hand there is the method dt.normalize and on the other the attribute dt.date as you see both belong to the attribute dt of the series.

An example of using your labels

df["StartTime"] = df["StartTime"].dt.normalize()

or

df["StartTime"] = df["StartTime"].dt.date

The difference between both possibilities is that normalize keeps you the type datetime64 that pandas uses internally which allows you to continue taking advantage of all the utilities that already come with pandas, while date returns an object datetime python.

To group by date you can do:

gb = df.groupby([df["StartTime"].dt.normalize(), df["EndTime"].dt.normalize()])
    
answered by 22.06.2017 в 13:22