Suppose we have two txt of the form:
file1.txt :
{'ID': '30', 'Name': 'ECONOMY I', 'Code': 'COFI-214'}
{'ID': '31', 'Name': 'ECONOMY I', 'Code': 'COFI-216'}
{'ID': '32', 'Name': 'ECONOMY I', 'Code': 'COFI-250'}
{'ID': '33', 'Name': 'ECONOMY I', 'Code': 'COFI-241'}
file2.txt :
{'Date': '30 / 10/2016 ',' Time ': '10: 40: 00'}
{'Date': '10 / 02/2017 ',' Time ': '23: 45: 00'}
{'Date': '07 / 12/2016 ',' Time ': '15: 30: 00'}
{'Date': '01 / 05/2016 ',' Time ': '03: 12: 00'}
Let's imagine that we want to obtain a third file that contains the dictionary resulting from joining the dictionary of a row of file1.txt with that of another row of < em> archivo2.txt .
We could simply read both files isolate the lines and concatenate strings properly:
with open('archivo1.txt') as f1, open('archivo2.txt') as f2, open('archivounion.txt', 'w') as of:
line_1 = f1.readlines()[0].strip()
line_2 = f2.readlines()[5].strip()
out_line = "{}, {}".format(line_1[:-1], line_2[1:])
of.write(out_line)
However this is not really joining two dictionaries since this involves deleting repeated keys and updating values if necessary .
Possibly the most appropriate way to do this is to create two Python dictionaries from the rows, for which we have several options, among them:
-
Construct the dictionary "manually" using only the methods of the chains to parse the line:
def str_to_dict(cad):
return {key: value for item in cad[1:-1].replace("'", "").split(", ")
for key, value in (item.split(": "),)}
-
Use ast.literal_eval
:
We can load the dictionary using eval:
diccionario = eval(linea)
The problem is that eval
evaluates any valid Python expression, which makes it very dangerous to user entries or uncontrolled data sources. There are ways to filter the entry to reduce the risk of code injection attacks, but in this case it is better to use ast.literal_eval
directly, which only allows a restricted set of literal structures (lists, dictionaries, sets, tuples, boleanos, None
, strings and numbers):
>>> from ast import literal_eval
>>> cadena = "{'ID': '30', 'Nombre': 'ECONOMIA I', 'Codigo': 'COFI-214'}"
>>> diccionario = literal_eval(cadena)
>>> print(diccionario['Nombre'])
ECONOMIA I
-
Use the json module, specifically json.loads
:
>>> import json
>>> cad = '{"ID": "30", "Nombre": "ECONOMIA I", "Codigo": "COFI-214"}'
>>> diccionario = json.loads(cad)
>>> diccionario
{'ID': '30', 'Nombre': 'ECONOMIA I', 'Codigo': 'COFI-214'}
The problem is that the JSON syntax is not identical to the one used by Python for dictionaries, for example JSON requires that the names of the properties are enclosed in quotation marks using double quotes.
Once this is done, we only need to link both dictionaries. For this we can use dict.update
.
>>> a = {'ID': '30', 'Nombre': 'ECONOMIA I', 'Codigo': 'COFI-214'}
>>> b = {'Fecha': '30/10/2016', 'Hora': '10:40:00'}
>>> a.update(b)
>>> print(a)
{'ID': '30', 'Nombre': 'ECONOMIA I', 'Codigo': 'COFI-214', 'Fecha': '30/10/2016', 'Hora': '10:40:00'}
If we do not want to modify any of the original dictionaries, we resort to copy.deepcopy()
:
>>> import copy
>>> a = {'ID': '30', 'Nombre': 'ECONOMIA I', 'Codigo': 'COFI-214'}
>>> b = {'Fecha': '30/10/2016', 'Hora': '10:40:00'}
>>> c = copy.deepcopy(a)
>>> c.update(b)
>>> print(c)
{'ID': '30', 'Nombre': 'ECONOMIA I', 'Codigo': 'COFI-214', 'Fecha': '30/10/2016', 'Hora': '10:40:00'}
In Python> = 3.6 we can simply do:
>>> a = {'ID': '30', 'Nombre': 'ECONOMIA I', 'Codigo': 'COFI-214'}
>>> b = {'Fecha': '30/10/2016', 'Hora': '10:40:00'}
>>> c = {**a, **b}
>>> print(c)
{'ID': '30', 'Nombre': 'ECONOMIA I', 'Codigo': 'COFI-214', 'Fecha': '30/10/2016', 'Hora': '10:40:00'}
With these concepts we can create a function to do what you want:
import ast
import copy
def join_dicts(d1, d2):
copy_dict = copy.deepcopy(d1)
copy_dict.update(d2)
return copy_dict
def get_lines(file, lines):
with open(file) as f:
return {i: string for i, string in enumerate(f) if i in lines}
def join_dicts_txt(file1, file2, out_file, rows):
with open(file1) as f1, open(file2) as f2, open(out_file, 'w') as of:
idxs1, idxs2 = zip(*rows)
lines1 = get_lines(file1, idxs1)
lines2 = get_lines(file2, idxs2)
lines_gen = (str(join_dicts(ast.literal_eval(lines1[r1]),
ast.literal_eval(lines2[r2])
)
) + '\n' for r1, r2 in rows
)
of.writelines(lines_gen)
In Python > = 3.6 we can simply do:
import ast
def get_lines(file, lines):
with open(file) as f:
return {i: string for i, string in enumerate(f) if i in lines}
def join_dicts_txt(file1, file2, out_file, rows):
with open(file1) as f1, open(file2) as f2, open(out_file, 'w') as of:
idxs1, idxs2 = zip(*rows)
lines1 = get_lines(file1, idxs1)
lines2 = get_lines(file2, idxs2)
lines_gen = (str({**ast.literal_eval(lines1[r1]),
**ast.literal_eval(lines2[r2])}
) + '\n' for r1, r2 in rows
)
of.writelines(lines_gen)
The function receives as parameters the three files (input1, input2 and output) and a tuple / list with the pairs of lines to be joined. For example, if we want to join the first dictionary of file1.txt with the third one of file2.txt and the second of file1.txt with is fourth of file2.txt the rows
parameter must be ((0,2),(1,3))
. For this example we would call the function like this:
join_dicts_txt('archivo1.txt', 'archivo2.txt', 'archivounion.txt', ((0,2),(1,3)))
The output is:
fileunion.txt :
{'ID': '30', 'Nombre': 'ECONOMIA I', 'Codigo': 'COFI-214', 'Fecha': '07/12/2016', 'Hora': '15:30:00'}
{'ID': '31', 'Nombre': 'ECONOMIA I', 'Codigo': 'COFI-216', 'Fecha': '01/05/2016', 'Hora': '03:12:00'}
For your example (link dictionary line 1 with dictionary line 6) you should call the function like this:
join_dicts_txt('archivo1.txt', 'archivo2.txt', 'archivounion.txt', [(0, 5)))
Note : The readlines
method (used in the first example) has been omitted because with this method we load all the lines of the file in a list, which is not a problem for small files but for relatively large files it is inefficient and very aggressive in the use of RAM, especially if in the end we are only going to use a few of the lines in the file.