Pandas Load bad dates in Python 3

Question

Pandas Load bad dates in Python 3

Navigation

#1 by (3 votes)

3

I'm scheduling a task in Python but loading the dates of a CSV using Pandas loads them erratically, sometimes changing the day for the month. For example, for these cases everyone should be from month 06, but sometimes it is 06 and sometimes it exchanges 06 for the day, changing the month, for example row 19 to 23, 27 and 28:

   ......
   19   2017-03-06 01:10:00
   20   2017-03-06 01:10:00
   21   2017-03-06 17:44:00
   22   2017-03-06 17:44:00
   23   2017-04-06 04:12:00
   24   2017-06-06 04:21:00
   25   2017-06-06 04:21:00
   26   2017-06-06 15:37:00
   27   2017-09-06 18:43:00
   28   2017-09-06 18:43:00
   29   2017-09-06 21:59:00
   ....

I load the file in this way and create list of dates:

   df = pd.read_csv('PORFA.csv', header=0, sep=';')
   Date2=pd.to_datetime(df["FDE Date"]) #Crear lista de Fechas

But the imprimier "Date2" throws it as it is above.

I enclose the BBDD that I am reading:

link

python

asked by Jorge Ponti 24.07.2017 в 19:03

source

1 answer

Obtain the highest value name of a Sql Server query Do filtering with jquery and php

score 3 · Accepted Answer

The problem is that by default he tries to parse the date interpreting that the month goes before the day, when he finds an incongruent date he tries using the first date as the day. This means that you do not get exceptions.

Your date has the format dd-mm-aaaa so you should indicate that you try to use the first date as day and not as month. For this use the attribute dayfirst :

import pandas as pd

df = pd.read_csv('PORFA.csv', header=0, sep=';')
Date2=pd.to_datetime(df["FDE Date"], dayfirst = True)

Exit:

>>> Date2
...
19   2017-06-03 01:10:00
20   2017-06-03 01:10:00
21   2017-06-03 17:44:00
22   2017-06-03 17:44:00
23   2017-06-04 04:12:00
24   2017-06-06 04:21:00
25   2017-06-06 04:21:00
26   2017-06-06 15:37:00
27   2017-06-09 18:43:00
28   2017-06-09 18:43:00
29   2017-06-09 21:59:00
...

Warning: dayfirst = True is not strict, this means that you will always try to pause the date using the first date as day, but if you encounter a date that is not parseable in this way, the It will parse using another format that is plausible. Let me explain, if you are with 05-17-2017 you will not throw an exception but will parse it as 2017-05-17 .

The above should not be a problem as long as you are sure of the validity of your dates and that they all have the dd-mm-aaaa format. This is the literal notice of the documentation:

Warning: dayfirst = True is not strict, but will prefer to parse with day first (this is a bug, based on dateutil behavior).

Edit:

You can properly parse the columns when loading the csv directly.

import pandas as pd

df = pd.read_csv('PORFA.csv', header=0, sep=';',
                 parse_dates = ['FDE Date'], dayfirst = True )

To parse_dates you must pass in this case the names of the columns that must be parsed as date. If they were non-standard dates you should use the date_parser argument by passing a function that parses the string to a valid date.

Later you can order the DataFrame by date using one of these columns:

df.sort_values(by="FDE Date", inplace = True)