Compare a coordinate list with another and its radius

3

Good!

Let's see I have a function that generates a list ( List A ) of coordinates in numpy .

  

[[1,1][1,6][3,5][5,7]] < - (for example)

And I have another list ( B list ) with more coordinates (Also in numpy )

format
  

[[2,1][1,5][7,8][9,10][10,10][11,11]]

Then my goal is to pass the list A and a radius (of 1 for example) compare with the list B and tell me which points of the list A intersects with one of the list B . in this case, passing radio 1, I would say

list A [[1,1][1,6]] intersects list [[2,1][1,5]]

I've thought it makes a function that generates the area of each point of the list A with that radius. and then I'll make you a convexhull ... but I do not know if there will be a faster and better function, since my list has more than 100,000 points.

The code I have is the following (but it is the non-optimal version)

def getRadius(cords,radio):
#   print cords[1:] lado derecho
#   print cords[:1] lado izquierdo 
    x = []
    y = []
    x.append(np.asscalar(cords[1:]) + radio/2)
    x.append(np.asscalar(cords[1:]) - radio/2)
    y.append(np.asscalar(cords[:1]) + radio/2)
    y.append(np.asscalar(cords[:1]) - radio/2)
    return(np.array(list(itertools.product(y, x))))

def getListNewCoords(listCoords,radio,Poids):
    newList = []
    for i in listCoords:
        newCoords = getRadius(i,radio)
        inHull = in_hull(Poids,newCoords)
        z = Poids[inHull]
        if not z:
            newList.append(i)
    return(newList)
    
asked by Iván Rodas Padilla 27.01.2017 в 10:19
source

2 answers

1

You can calculate the distance of each point in the first list (I'll call it a ) with the distance of all the points in the second list (I'll call it b ). If you do it for each point of the first list you can do it iteratively (I do not know how you want the output). An example could be:

import numpy as np

# array con 15 puntos
a = np.random.randint(1,10,(15,2))
# array con 10 puntos
b = np.random.randint(1,10,(10,2))

# Función que calcula la distancia de un 
# punto de en 'a' a todos los puntos de 'b'
def calcula_distancias(point, arr, radio=1):
    dist = np.sqrt((point[0] - arr[:,0])**2 + (point[1] - arr[:,1])**2)
    indexes = dist <= radio
    return arr[indexes], indexes

for punto in a:
    print(punto, calcula_distancias(punto, b, radio=3)) # ajusta el radio

The function returns the points of b that are the same distance or less than radio . You could do it without the for building an array of distances that had the dimensions of (len(a), len(b)) and from it create an array of Booleans with those that were below radio but that I leave as duties to which want.

    
answered by 27.01.2017 в 14:28
0

As you say, I think you'd do well with some "clustering" (eg: k-means) technique. What I do not advise when you have a lot of points that you make a comparison O(n^2) as the solution you propose or the one proposed by @kikocorreoso.

The strategy that I would continue to divide the lists of points in some way to limit the amount of points to be compared ( "clustering" ). For example, we can calculate the average of all the points in a list and divide it into four lists. To make the comparison, we first compare with the calculated mean to know which of the 4 clusters we have to compare. A code to give you an idea:

import numpy as np
from math import sqrt

# array con 1500 puntos
a = np.random.randint(-100, 100, (1500, 2))
# array con 1000 puntos
b = np.random.randint(-100, 100, (1000, 2))

m = b.mean(0)

radius = 1

cluster = [
    b[np.logical_and(b[:,0] <= m[0]+radius, b[:,1] <= m[1]+radius)],
    b[np.logical_and(b[:,0] >  m[0]-radius, b[:,1] <= m[1]+radius)],
    b[np.logical_and(b[:,0] <= m[0]+radius, b[:,1] >  m[1]-radius)],
    b[np.logical_and(b[:,0] >  m[0]-radius, b[:,1] >  m[1]-radius)],
]

def getCluster(p):
    k  = 0 if p[0] <= m[0] else 2
    k += 0 if p[1] <= m[1] else 1
    return cluster[k]

def dist(x,y):
    return sqrt((x[0]-y[0])**2 + (x[1]-y[1])**2)

res = [ (x,y) for x in a for y in getCluster(x)
         if dist(x,y) <= radius ]

def unzip(tuples): return list(zip(*tuples))


from pprint import pprint
pprint(unzip(res))

We have reduced the comparisons to be made in 1/4. We can continue to apply the same strategy by re-dividing each cluster into four until we reach an acceptable cluster size (~ 1000 elements). If we have a machine with several processors, we can even consider parallelizing the process by assigning each processor a cluster (similar to what Apache Spark would do).

Anyway, it's almost certain that some clustering tool can be helpful, like scipy and sklearn (I do not know much about them).

    
answered by 27.01.2017 в 22:01