Excel data extraction to create Python list

1

I have the following excel file:

connection A    auto
connection A    camioneta
connection A    puerta
connection B    camioneta
connection B    puerta
connection B    capo

and I'm looking to put together a diction in python like this:

Lista = {'connection A':["auto","camioneta","puerta"],
         'connection B':["camioneta","puerta","capo"]
         }

the file is .xlsx and I need that for each different data that exists in column A of the excel I create a dictionary with all the data that correspond to that same field. I do not know if he understood what I want to do

I work with python and I can not use other software than that. (python 2.7)

Thanks

    
asked by Martin Bouhier 31.10.2017 в 04:45
source

2 answers

1

You have many libraries to read Excel files. We can use openpyxl next to collections.defaultdict :

import collections
import openpyxl


wb = openpyxl.load_workbook('dat.xlsx')
ws = wb.get_sheet_by_name('Hoja1')

Lista = collections.defaultdict(list)
for row in ws.iter_rows():
    Lista[row[0].value].append(row[1].value)

A itertools.defaultdict object can be used exactly like a normal dictionary.

Another option is to use Pandas and take advantage of the pivot If you are not going to operate with the data and you just want to get the dictionary this option is possibly too much for something so simple, however.

import pandas as pd

df = pd.read_excel("datos.xlsx", header = None).pivot(columns = 0,  values = 1)
Lista = {col: df[col].dropna().tolist() for col in df.columns}

For a file like the following:

We get:

>>> Lista
{'connection A': ['auto', 'camioneta', 'puerta'],
 'connection B': ['camioneta', 'puerta', 'capo']}
  

Note: Both pandas and openpyxl are external python libraries that need to be installed, via pip for example.

Edit:

To store the variable in a file there are multiple forms. A very simple one is to serialize the object using cPickle / Pickle:

import collections
import pickle
import openpyxl


wb = openpyxl.load_workbook('dat.xlsx')
ws = wb.get_sheet_by_name('Hoja1')

Lista = collections.defaultdict(list)
for row in ws.iter_rows():
    Lista[row[0].value].append(row[1].value)

# Ahora serializamos el objeto Lista.
# El archivo lo vamos a llamar 'Lista.dat':

with open("Lista.dat", "wb") as f:
    pickle.dump(Lista, f)

Now we can from any module deserialize the object:

import pickle

with open("Lista.dat", "rb") as f:
    Lista = pickle.load(f)

print(Lista)
    
answered by 31.10.2017 / 16:44
source
1

Hello I use a library xlrd serves to read

import xlrd

wb2 = xlrd.open_workbook(ruta)
#ruta es obviamente algo como "archivo.xls"
sheet = wb2.sheet_by_index(0)

assuming that your excel file has only one data sheet and it is the first one or the 0

dir_cantidad = len(sheet.col_values(0)) - 1
#asumiendo que tus datos comienzan en la 'columna A = 0'
# dir cantidad contiene todos las celdas desde la fila 1 hasta que encuentre datos

Lista = {}

for i in range(0,dir_cantidad):
    # i tomara el valor de cada fila así,   1, 2, 3, 4, 5, 
    connection = sheet.cell(i, 0).value
    #asumiendo que tu primer dato esta en 0, 0 es decir, Columna A Fila 1

    if connection in Lista:
        conn = Lista.get(connection)
        conn.append(sheet.cell(i, 1).value)
        #asumiendo que tu siguiente dato esta en columna B = 1
        Lista[connection] = conn
    else:
        Lista[connection] = [sheet.cell(i, 1).value]

Actually the only thing he does is read your data sheet, and put the data in a new dictionary, if the key already exists add it to the current value.

Although it is worth specifying that the result is a Dictionary {} not a list [] , the value that contains the key in this case if it is a list

I hope it serves you. Greetings

    
answered by 31.10.2017 в 14:05