Create an array of groups and ids related to groups

Question

Create an array of groups and ids related to groups

Navigation

#1 by (2 votes)

1

I want to create an array, a dictionary or a DataFrame (whatever the form) that contains the id grouped by group of subscribers that are in the same group.

The ids are in a DataFrame side_subscriber.index , the output of this array is:

Int64Index([160, 161, 296, 306, 365, 386, 471], dtype='int64', name=u'subscriber_id')

Groups are in numpy.ndarray called indexResultat :

[1 1 0 0 1 1 1]

I try to do the following without knowing how to initiate the array grouping by group:

kernelGroup = []
i = 0
for idx in indexResultat:
    print "idx : ",idx
    i = i+1
    print kernelGroup
    for kernel in kernelGroup:
        print "kernel : ",kernel
        if idx == kernel:
            print "we have the group ",kernel 
            print kernel
            # anadimos el id
            kernelGroup = kernelGroup[kernel].append(side_subscriber.index[idx])
            break
    # no habemos el grupo
    print "we don't have the group", idx
    #kernelGroup = kernelGroup.append(kernelGroup,[idx,side_subscriber.index[idx]])
    kernelGroup = kernelGroup.append([idx,side_subscriber.index[i]])

print kernelGroup

And I get:

idx :  1
[]
we don't have the group 1
idx :  1
None

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-64-a0add6c15d78> in <module>()
      5     i = i+1
      6     print kernelGroup
----> 7     for kernel in kernelGroup:
      8         print "kernel : ",kernel
      9         if idx == kernel:

TypeError: 'NoneType' object is not iterable

The output I expect this

{0:[296, 306], 1:[160, 161, 365, 386, 471]}:

I know that this function does more or less what I want to do:

def cluster_points(X, mu):
    clusters  = {}
    for x in X:
        bestmukey = min([(i[0], np.linalg.norm(x-mu[i[0]])) \
                    for i in enumerate(mu)], key=lambda t:t[1])[0]
        try:
            clusters[bestmukey].append(x)
        except KeyError:
            clusters[bestmukey] = [x]
    return clusters

python

asked by ThePassenger 30.06.2017 в 15:12

source

1 answer

C # subprocesses and forms Problems validating contact form fields

score 2 · Accepted Answer

Depending on whether you need more or less efficiency you can do it in many ways (with Pandas, NumPy or with standard Python only). A very simple one is through Pandas and DataFrame.groupby :

import pandas as pd
import numpy as np

# Simulamos tus datos de orígen
df = pd.DataFrame(index=[160, 161, 296, 306, 365, 386, 471])
grupos = np.array([1, 1, 0, 0, 1, 1, 1])

res = pd.DataFrame({'ids': df.index, 'grupos': grupos})
res = res.groupby('grupos')['ids'].apply(np.array).to_frame('ids')

With what we get:

>>> res

                             ids
grupos                           
0                      [296, 306]
1       [160, 161, 365, 386, 471]

The ids column contains NumPy arrays.

If you need more efficiency you have to go down one level and use NumPy, sorting the array using grupos as key and slicing.