Convert datetime column dataframe

1

Good morning,

My script reads a csv with various columns. In the pandas function read_csv reads:

data = pd.read_csv(filepath, index_col='Target', parse_dates=True, infer_datetime_format=True)

The dataframe looks like this:

Target        Observer          StartTime                 StopTime  
Target1       RT1               2019-09-01 20:47:50.02    2019-09-01 20:57:50.02     

But the dtypes that it gives me are:

Observer        object
StartTime       object
Stoptime        object

I need to take the duration of each observation, and then other operations in which I need to be able to deal with the dates (i.e: see how long it takes for the next observation or lead(StartTime) - StopTime to appear).

I can not pass in any way StartTime and StopTime to datetime to operate better with them.

I've tried pd.to_datetime() and other functions and they do not work.

Any ideas?

Thanks in advance!

    
asked by cluna 21.06.2017 в 10:09
source

1 answer

1

To solve your problem, it is enough to explicitly pass to the columns that you must parse using the date_parses argument. The format is perfectly acceptable for numpy.datetime64 :

import pandas as pd

data = pd.read_csv('datos.csv', index_col='Target',
                   parse_dates=['StartTime','StopTime'])

data['Elapsed_time'] = data['StopTime'] - data['StartTime']

print(data.dtypes)
print(data)

The input csv that I used (inventing data):

  

Target, Observer, StartTime, StopTime
  Target1, RT1,2019-09-01 20: 47: 50.02,2019-09-01 20: 57: 50.02
  Target2, RT1,2020-10-15 03: 20: 10.21,2020-10-15 04: 01: 48.21
  Target3, RT1,2019-03-14 17: 47: 13.37,2019-03-14 17: 57: 21.52
  Target4, RT1,2019-12-27 13: 15: 35.03,2019-12-27 14: 57: 14.52

Exit:

Observer                 object
StartTime        datetime64[ns]
StopTime         datetime64[ns]
Elapsed_time    timedelta64[ns]
dtype: object
        Observer               StartTime                StopTime  \
Target                                                             
Target1      RT1 2019-09-01 20:47:50.020 2019-09-01 20:57:50.020   
Target2      RT1 2020-10-15 03:20:10.210 2020-10-15 04:01:48.210   
Target3      RT1 2019-03-14 17:47:13.370 2019-03-14 17:57:21.520   
Target4      RT1 2019-12-27 13:15:35.030 2019-12-27 14:57:14.520   

           Elapsed_time  
Target                   
Target1        00:10:00  
Target2        00:41:38  
Target3 00:10:08.150000  
Target4 01:41:39.490000

I have created a new column showing the time difference between both columns.

    
answered by 21.06.2017 / 11:31
source