Error when trying to compare dates of two time series: Can not compare type 'Timestamp' with type 'str' "

1

I'm doing a data transformation and I need to run a while between date periods, but I have the error:

  

"TypeError: Can not compare type 'Timestamp' with type 'str'"

I have seen with type() what type of data they are and both are class 'pandas.tslib.Timestamp' therefore I do not understand what happens.

I enclose my complete code so you can see it and a CSV file so that you can run it and perform the test:

from collections import defaultdict
import pandas as pd
from datetime import datetime, timedelta
from time import time

fecha_i=[]
fecha_f=[]
contador=[]
id=[]
contador={}
contador = defaultdict(list)

df = pd.read_csv('contour-export-2017-12-14.csv', header=0, sep=',',parse_dates = ['FCH_HORA_INICIO'],dayfirst = True, usecols=[0,3,6,7])

fecha_i=df['FCH_HORA_INICIO']
fecha_f=df['FCH_HORA_TERMINO']
id=df['ID']

acumulado=fecha_i[0]
i=1
k=0

print(type(acumulado))
print(type(fecha_i[0]))

while(acumulado<datetime.now()):
    acumulado=fecha_i[0]+timedelta(days=i)
    k=0
    while k<=len(df)-1:
        if acumulado>=fecha_i[k] and acumulado<=fecha_f[k]:
            contador[acumulado].append(str(ID[k]))
            k=k+1
        else:
            k=k+1
    i=i+1

File: link

    
asked by Jorge Ponti 14.12.2017 в 20:38
source

2 answers

2

When converting the data in time through parse_dates converts it to Timestamp , this type of data does not serve to make comparisons, what you should do is convert it to datetime through the function to_datetime ():

fecha_i= pd.to_datetime(df['FCH_HORA_INICIO'], errors='coerce')
fecha_f= pd.to_datetime(df['FCH_HORA_TERMINO'], errors='coerce')

Code:

from collections import defaultdict
import pandas as pd
from datetime import datetime, timedelta
from time import time

fecha_i=[]
fecha_f=[]
contador=[]
id=[]
contador={}
contador = defaultdict(list)

df = pd.read_csv('contour-export-2017-12-14.csv', 
    header=0, sep=',',
    parse_dates = ['FCH_HORA_INICIO'],
    dayfirst = True, usecols=[0,3,6,7])

fecha_i= pd.to_datetime(df['FCH_HORA_INICIO'], errors='coerce')
fecha_f= pd.to_datetime(df['FCH_HORA_TERMINO'], errors='coerce')

ID=df['ID']

acumulado= fecha_i[0]
i=1
k=0

while acumulado < datetime.now():
    acumulado=fecha_i[0]+timedelta(days=i)
    k=0
    while k<= len(df)-1:
        if acumulado>=fecha_i[k] and acumulado<=fecha_f[k]:
            contador[acumulado].append(str(ID[k]))
            k=k+1
        else:
            k=k+1
    i=i+1
    
answered by 14.12.2017 / 20:54
source
1

The problem is that you are not parsing the column 'FCH_HORA_TERMINO' because you do not indicate it in the parse_dates argument, so the data type that contains the column FCH_HORA_TERMINO are Python strings, while FCH_HORA_INICIO is a column of type datetime64 (NumPy).

It is enough for you to indicate to parse_dates that you also have this column in mind as date and parsee it to datetime64[ns] :

df = pd.read_csv('contour-export-2017-12-14.csv', header=0, sep=',', usecols=[0,3,6,7], 
                 parse_dates = ['FCH_HORA_INICIO', 'FCH_HORA_TERMINO'], dayfirst = True)

Both columns are parsed taking into account that the day goes before the month in both columns.

    
answered by 14.12.2017 в 21:42