Detect matches in multiple files with Python

0

Hi, I am new programming and I have been experimenting doing some scripts that have helped me a lot in my work, I would like in this case to be able to advance more with a script but I can not find a way to achieve what I want to do.

My script is:

with open('D:/Python/detectar_coincidencias/Emails/00AA34B3078446DB90C489BAFF37B611.MAI', 'r') as file1:
    with open('D:/Python/detectar_coincidencias/Emails/0AFA934556264ABFA0FC901F12786D29.MAI', 'r') as file2:
        same = set(file1).intersection(file2)

same.discard('\n')

with open('some_output_file.txt', 'w') as file_out:
    for line in same:
        file_out.write(line)

This only helps me to see two files in a single folder but I would like you to detect the number of files with .MAI extension in this folder and be able to compare them in some way and write the matches found in another new file.

Can you help me by giving me some clues or some solutions on how to achieve it?

Greetings and thank you very much!

    
asked by Hugo Lesta 29.12.2016 в 16:56
source

3 answers

1

Keeping your comparison criteria as not specific anymore, this is my solution:

import os
import glob
import filecmp

path = 'directorio de los archivos'
extension = '*.txt'

def comparar_linea(file1, file2):
    data1 = [line.strip() for line in file1.readlines()]
    data2 = [line.strip() for line in file2.readlines()]
    same = set(data1).intersection(data2)
    return same

def buscarCoincidencias(path, extension):
    with open('some_output_file.txt', 'w') as file_out:
        os.chdir(path)
        files = glob.glob(extension)
        for i in range(len(files)):
            for j in range(i+1, len(files)):
                with open(files[i]) as file1:
                    with open(files[j], 'r') as file2:
                        same = comparar_linea(file1, file2)
                        file_out.write(files[i] + "-"+ files[j] + ": "+str(same)+"\n")


buscarCoincidencias(path, extension)

In my case I have:

test.txt:

1
2
3
4
5

test1.txt

3
4
5
6

test2.txt

2
2
3
3

some_output_file.txt:

test2.txt-test1.txt: {'3'}
test2.txt-test.txt: {'3', '2'}
test1.txt-test.txt: {'5', '4', '3'}
    
answered by 29.12.2016 / 19:27
source
2

To obtain all the files with a certain extension you can use several methods. One option is to use os.listdir to list all files and then filter by the endswith attribute. Another option is to use glob .

As for comparing them, it depends on what you want to do. If you want to get the lines that are present in all files without exception you can use set.interesection as you do in your example. If you want to compare in pairs or something else you should specify a little more how you want to make the comparison so you can help better.

I leave an example with comments to see if it is what you want:

import glob
import os


ruta = 'D:/Python/detectar_coincidencias/Emails'
os.chdir(ruta)

# Creamos un generador para abrir todos los archivos con extensión .MAI de la ruta
open_files = (open(file, 'r') for file in glob.glob("*.MAI"))

# Comparamos todos los archivos usando set.intersection().
## Esto nos retornará un conjuntos con las lineas PRESENTES en TODOS los archivos
same = set.intersection(*map(set, open_files))
same.discard('\n')

#Cerramos todos los archivos abiertos:
for f in open_files:
    f.close()

#Abrimos el archivo de destino y guardamos el contenido de la comparacion:
with open('some_output_file.txt', 'w') as file_out:
for line in same:
    file_out.write(line)
    
answered by 29.12.2016 в 19:19
0

Hello, a possible solution would be this

import os
import filecmp

def filtrarCoincidencias(ruta,extension):
    files=[ x for x in os.listdir(ruta) if x.endswith(extension)]
    coincidencias=[]

    for i,f1 in enumerate(files):
        for j,f2 in enumerate(files):
            if(j>i):
                if(filecmp.cmp(ruta+f1,ruta+f2)):
                    coincidencias.append((f1,f2))
    return coincidencias


for tupla in filtrarCoincidencias("/home/crack81/Escritorio/folder/",".MAI"):
    print(tupla)

To the function filterCondries you pass the path where your file is and the extension of the files you want to compare, it returns a list of tuples with the names of the files that match. It would only be missing that you write the matches in another file.

    
answered by 29.12.2016 в 18:53