concatenate 2 array with condition in python

2

I have 2 array, of 2 columns each. Array A has the form:

pid                track

0        77vFvcdWZi0ZqiDOoHU3YI 
1        0e9hR1vTrzlUvFH5PgA9rY 
2        00z4wF0iJsp6GwDkQxkGs6 
3        0CyRloqqjpeKEAd3cO6J6z 
4        0c1gHntWjKD7QShC8s99sq
...      .....

while array B is:

     diccionario                  id
0UaMYEvWZi0ZqiDOoHU3YI            34
0e9hR1vTrzlUvFH5PgA9rY             3
00z4wF0iJsp6GwDkQxkGs6             1 
0CyRloqqjpeKEAd3cO6J6z             2
0c1gHntWjKD7QShC8s99sq            16
...      .....

I need an array C, where the id of that array is the dictionary id of array B that corresponds to the track of array A, that is, in the form:

pid                 id
 0                  22 
 1                   3
 2                   1
 3                   2
 4                  16
...                .....

But I do not know how to do it in python

    
asked by Skiel 24.05.2018 в 10:05
source

1 answer

0

If you can use the pandas library, always advisable to deal with structured data in tables as is your case, the solution is:

  • Load each of your tables into a pandas dataframe
  • Use the "track" column in table 1 as an index, and the "dictionary" column in table 2.
  • Make a join() of both tables. Pandas is in charge of "making fit" the rows with the same index.
  • As a consequence of the previous step, "NaN" values (which are a kind of null values) may appear in elements that have an index in one of the tables but not in the other. Optionally you can change those "NaN" for any other value (zero, or a string of dashes, etc.)
  • The result of the join will use as index the hashes that served to make the mixture. If you do not want them you can make a reset_index() to the result
  • You can also optionally set the column pid as result index.
  • The following fragment does steps 1 and 2. In this case, the tables "read" them from strings, since I find it more comfortable to do so simply by copying and pasting the example of your question. In your case, you probably read it from a file. Pandas can read csv and excel and many other formats.

    Edit : By the way, I've messed up the rows in array B so that it looks better than .join() does not just concatenate "horizontally", but actually "searches" in B the elements based on the indexes.

    import io
    import pandas as pd
    
    datos1 = """\
    pid                track
    0        77vFvcdWZi0ZqiDOoHU3YI 
    1        0e9hR1vTrzlUvFH5PgA9rY 
    2        00z4wF0iJsp6GwDkQxkGs6 
    3        0CyRloqqjpeKEAd3cO6J6z 
    4        0c1gHntWjKD7QShC8s99sq"""
    
    datos2 = """\
         diccionario                  id
    0CyRloqqjpeKEAd3cO6J6z             2
    00z4wF0iJsp6GwDkQxkGs6             1 
    0c1gHntWjKD7QShC8s99sq            16
    0e9hR1vTrzlUvFH5PgA9rY             3
    0UaMYEvWZi0ZqiDOoHU3YI            34"""
    
    a = pd.read_table(io.StringIO(datos1), sep=r'\s+').set_index("track")
    b = pd.read_table(io.StringIO(datos2), sep=r'\s+').set_index("diccionario")
    

    The following fragment does steps 3 and 4 (fill in the unknown values with a "-"):

    c = a.join(b).fillna("-")
    print(c)
    
                            pid  id
    track                          
    77vFvcdWZi0ZqiDOoHU3YI    0   -
    0e9hR1vTrzlUvFH5PgA9rY    1   3
    00z4wF0iJsp6GwDkQxkGs6    2   1
    0CyRloqqjpeKEAd3cO6J6z    3   2
    0c1gHntWjKD7QShC8s99sq    4  16
    

    The following code would do steps 5 and 6, that is, remove the "track" index and put pid as the new index, giving the desired result:

    c = c.reset_index(drop=True).set_index("pid")
    print(c)
    
         id
    pid    
    0     -
    1     3
    2     1
    3     2
    4    16
    
        
    answered by 24.05.2018 / 10:31
    source