How to create a DataFrame from an array of listings?

1

How to create a DataFrame from an array of non-indexed listings by indexing it over the arrays?

For example

[[{'count': 6L, 'item_id': 11313}, {'count': 6L, 'item_id': 11348},
{'count': 1L, 'item_id': 11338}, ],[{'count': 4L, 'item_id': 11311},
{'count': 3L, 'item_id': 11281}]]

You must give a dataframe like the following:

+---------+-------+---------+
| user_id | count | item_id |
+---------+-------+---------+
|    0    |    6  |  11313  |
|    0    |    6  |  11348  |
|    0    |    1  |  11338  |
|    1    |   4   |  11311  |
|    1    |   3   |  11281  |
    
asked by ThePassenger 10.07.2017 в 12:38
source

2 answers

2

You can create a DataFrame for each sublist and use pandas.concat to concatenate them. To create the column users_id you are worth the argument keys . Since keys creates us a multiindex we use rest_index to pass user_id to a column and restart the index of our final dataframe:

import pandas as pd

datos = [[{'count': 6L, 'item_id': 11313}, {'count': 6L, 'item_id': 11348},
          {'count': 1L, 'item_id': 11338}, ],[{'count': 4L, 'item_id': 11311},
          {'count': 3L, 'item_id': 11281}]]


res = pd.concat(objs = (pd.DataFrame(f) for f in datos),
                keys = (n for n in range(len(datos))),
                names = ['user_id'])

res.reset_index(level = 0, inplace = True)
res.reset_index(drop = True, inplace = True)

Exit:

>>> res

   user_id  count  item_id
0        0      6    11313
1        0      6    11348
2        0      1    11338
3        1      4    11311
4        1      3    11281
    
answered by 10.07.2017 в 16:41
0

Another option to what @FJSevilla raises would be to first create something that understands .

In my case I first create a dictionary that can understand and then I create the dataframe from the dictionary.

import pandas as pd

kk = [[{'count': 6, 'item_id': 11313},
       {'count': 6, 'item_id': 11348},
       {'count': 1, 'item_id': 11338}],
      [{'count': 4, 'item_id': 11311}, 
       {'count': 3, 'item_id': 11281}]]

resultado = {'user_id': [], 'count': [], 'item_id': []}
for i, lista in enumerate(kk):
    for d in lista:
        resultado['user_id'].append(i)
        resultado['count'].append(d['count'])
        resultado['item_id'].append(d['item_id'])

df = pd.DataFrame(resultado)
    
answered by 10.07.2017 в 16:55