If you can use the pandas library, always advisable to deal with structured data in tables as is your case, the solution is:
Load each of your tables into a pandas dataframe
Use the "track" column in table 1 as an index, and the "dictionary" column in table 2.
Make a join()
of both tables. Pandas is in charge of "making fit" the rows with the same index.
As a consequence of the previous step, "NaN" values (which are a kind of null values) may appear in elements that have an index in one of the tables but not in the other. Optionally you can change those "NaN" for any other value (zero, or a string of dashes, etc.)
The result of the join will use as index the hashes that served to make the mixture. If you do not want them you can make a reset_index()
to the result
You can also optionally set the column pid
as result index.
The following fragment does steps 1 and 2. In this case, the tables "read" them from strings, since I find it more comfortable to do so simply by copying and pasting the example of your question. In your case, you probably read it from a file. Pandas can read csv and excel and many other formats.
Edit : By the way, I've messed up the rows in array B so that it looks better than .join()
does not just concatenate "horizontally", but actually "searches" in B the elements based on the indexes.
import io
import pandas as pd
datos1 = """\
pid track
0 77vFvcdWZi0ZqiDOoHU3YI
1 0e9hR1vTrzlUvFH5PgA9rY
2 00z4wF0iJsp6GwDkQxkGs6
3 0CyRloqqjpeKEAd3cO6J6z
4 0c1gHntWjKD7QShC8s99sq"""
datos2 = """\
diccionario id
0CyRloqqjpeKEAd3cO6J6z 2
00z4wF0iJsp6GwDkQxkGs6 1
0c1gHntWjKD7QShC8s99sq 16
0e9hR1vTrzlUvFH5PgA9rY 3
0UaMYEvWZi0ZqiDOoHU3YI 34"""
a = pd.read_table(io.StringIO(datos1), sep=r'\s+').set_index("track")
b = pd.read_table(io.StringIO(datos2), sep=r'\s+').set_index("diccionario")
The following fragment does steps 3 and 4 (fill in the unknown values with a "-"):
c = a.join(b).fillna("-")
print(c)
pid id
track
77vFvcdWZi0ZqiDOoHU3YI 0 -
0e9hR1vTrzlUvFH5PgA9rY 1 3
00z4wF0iJsp6GwDkQxkGs6 2 1
0CyRloqqjpeKEAd3cO6J6z 3 2
0c1gHntWjKD7QShC8s99sq 4 16
The following code would do steps 5 and 6, that is, remove the "track" index and put pid
as the new index, giving the desired result:
c = c.reset_index(drop=True).set_index("pid")
print(c)
id
pid
0 -
1 3
2 1
3 2
4 16