Pandas. Import all the columns of an Excel sheet in Jupyter Notebook

0

In an open book in Jupyter Notebook, I try to import all the columns of an Excel sheet (16), using the script

file = "datos.xlsx"
# Import the data
df = pd.read_excel(file, sheetname = "Financiera", na_values = "n/a")

The script works correctly in other Python shell's. In Jupyter Notebook, of the 16 columns, I only care about the first 9. What can be the cause?

With df.column and df. shape I get

    Index(['TipInv', 'Fondo', 'FechaInv', 'NumPar', 'ValPar', 'ImpInv'], dtype='object')
(23, 6)

I modify the script, according to suggestions in the query link

import pandas as pd
from IPython.display import display
file = "mis_inversiones.xlsx"
# Import the data
df = pd.read_excel(file, sheetname = "Financiera", na_values = "n/a")
# Display the head of the data
pd.options.display.max_columns = None
display(df.head())

The visualization of the df improves noticeably, but the result remains the same. I check with a statement that uses some of the missing columns to perform a calculation, and I actually understand that the error that shows that there is no such column

KeyError: 'Tot_Pond'

The Excel book used in Jupyter, is a copy of the one used in the shell's in which the script works correctly. Of the columns displayed, the last contains data calculated in the Excel using a formula. Of the columns that until now are not imported into Jupyter, one is dates and the others contain data calculated using formulas. The version of Pandas checked in Jupyter and in a Shell is 0.20.3 There are no empty cells with the N / A in the Excel Worksheet. I have reinstalled xlrd. The version of xlrd is 1.1.0

The option to import from .csv (gives me the same problem), does not help me, since when I save as .csv, the formulas disappear and I would have to work with three Excel sheets, one with each sheet of the Excel book.

    
asked by efueyo 11.04.2018 в 00:20
source

2 answers

0

Try pandas bookstore

import pandas as pd
FileNAME="datos.xlsx"
DAT= pd.ExcelFile(FileNAME)
# Te muestra cuantas hojas tiene el libro de excel
DAT.sheet_names
# En este caso extraeremos los datos de la hoja 1
df = DAT.parse("HOJA-1")
# visualizamos las primeras 5 filas 
df[0:5]

You can also use

FileNAME="datos.xlsx"
DATx= pd.ExcelFile(FileNAME,header=1)
DATx.sheet_names
DATx = DATx.parse("DATA-1")
DATx.head()

Greetings, I hope it will help you.

    
answered by 11.04.2018 в 04:02
0

After installing xlrd version 1.1.0, the script already works correctly in Jupyter. It remains as follows:

    # -*- coding: utf-8 -*-
"""
Created on Thu Dec 21 16:05:50 2017
@author: efueyo
"""
import pandas as pd
from IPython.display import display
file = "datos.xlsx"
# Import the data
df = pd.read_excel(file, sheetname = "Financiera", na_values = "n/a")
# Display the head of the data
pd.set_option('display.max_columns', None)
display(df.head())
# Inspect the data
df.info()
df.columns
print (df.head())

Another change that could have influenced has been to fill in three empty cells identified in one of the columns.

    
answered by 11.04.2018 в 22:39