Filter data already read

0

I have the following code.

read = fiona.open(r"\pre\data\PLANTA\URBANA.shp")

    for feat in read:
        corPoligono = feat['geometry']['coordinates']
        poligonoType = feat['geometry']['type']
        idCapaOrigen = feat['id']

These 3 data that I get from the shape are inserted in a table, the problem happens that if the process for some reason stops when I return to process in the for I return to process the data that are already inserted in the base.

The question is, you can filter in the for or some other method the data already inserted in the table, you could use the field idCapaOrigen that is unique.

My doubt arises since I would like to avoid reading the data again and that the work does MSSQL with a NOT IN or a NOT EXIST since the shape contains an amount huge data and read again the already processed is a never ending.

I hope you understand the question.

    
asked by Sebastian 20.03.2018 в 12:06
source

1 answer

0

One solution (although I am not sure if it would be the most appropriate) would be to use the data type "set" ( set() ) of python to go putting in it the elements that you have already processed. The sets are very efficient when verifying if an element was already in them (they are implemented with tables hash ). Thus, in each iteration of your loop, before processing the data in question you see if the id was already in the set.

Something like this:

ya_procesados = set()
for feat in read:
    if feat['id'] in ya_proceados:
       continue   # Te lo saltas
    # En caso contrario se procesa
    corPoligono = feat['geometry']['coordinates']
    poligonoType = feat['geometry']['type']
    idCapaOrigen = feat['id']
    # ...
    # y se añade al conjunto
    ya_procesados.add(idCapaOrigen)

If the process stops, obviously you would lose what you have in the set, but the program at startup could make a query to the database to get all the id already entered, and use your response to initialize the set ya_procesados .

The set() supports a list as a parameter, for example.

    
answered by 20.03.2018 / 12:16
source