Date format when importing from Excel

1

I am learning to work with DataFrames in Pandas. In my case, the original format in the Excel sheet is "dd-mm-YY". When importing to a DataFrame with pandas, using the statement

df = pd.read_excel(file, parse_dates=['Fecha'], dayfirst = True)

the format that you present to me is "YY-mm-dd" (" Fecha " is the name of the column in which I have entered dates).

Not only does it not maintain the original format, but it modifies it. What options do I have to change that format and return to the original, acting on the DataFrame?

    
asked by efueyo 24.09.2017 в 23:08
source

2 answers

1

First of all clarify that this is a visualization problem and that it does not affect at all the calculations that can be applied in the column. in fact, the date is in datetime64 format of NumPy that saves it in nanoseconds, it is at the time of printing the data when they are transformed into something humanly readable.

In theory there is a way to do what you want with pandas.set_options using the display.date_dayfirst option. I say in theory because at least today with pandas 0.20.3 does not seem to be correctly implemented and does not seem to work

>>> import pandas as pd

>>> pd.set_option('display.date_dayfirst', True)
>>> pd.set_option('display.date_yearfirst', False)
>>> df = pd.read_excel('datos.xlsx', parse_dates=['Fecha'], dayfirst = True ) 

>>> pd.get_option('date_dayfirst')
True
>>> df
       Fecha
0 2017-01-10
1 2017-05-12
2 2016-03-04
3 2016-10-08

This would be optimal if it worked ... The other option is to pass the date to a string using strftime to format it at our whim. The problem is that the column will now contain data of type object that are Python strings. The problem is that we can not operate with them as dates (filter, add days, etc.)

>>> import pandas as pd

>>> df = pd.read_excel('datos.xlsx', parse_dates=['Fecha'], dayfirst = True ) 
>>> df['Fecha2'] = df['Fecha'].dt.strftime('%d-%m-%Y')
>>> df

       Fecha      Fecha2
0 2017-01-10  10-01-2017
1 2017-05-12  12-05-2017
2 2016-03-04  04-03-2016
3 2016-10-08  08-10-2016

>>> df.dtypes
Fecha     datetime64[ns]
Fecha2            object
dtype: object

In this case a new column is created to see the difference, you can also directly replace the column Fecha . The problem, I repeat, is that the column is no longer type Datetime .

    
answered by 25.09.2017 / 13:12
source
0

Indeed, with my version of Pandas, included in Anaconda2, it does not work either. However, I have learned from Your answer, fundamental concepts:

a) .- The format "date" in the Dataframe created with pandas is datetime64 ** of ** NumPy **** that saves it in nanoseconds. (I will investigate more in Numpy about this format and its possibilities)

b) .- There is the option to pass the date to a string using strftime , but it could no longer make operations with the dates.

c) .- In case some user does not mind having the dates as strings, there is the option to import them with openpyxl .

import openpyxl

wb= openpyxl.load_workbook('fichero.xlsx')

The output format of the DataFrame is

<class 'openpyxl.worksheet.worksheet.Worksheet 

and the dates are strings.

    
answered by 25.09.2017 в 16:03