Python, compare two files

1

I have to read all the files that are inside a directory and then compare them with those of a .txt file

import os
from os import listdir
from os.path import isfile,join

mi_path = "c://python27//proyectos//"
f = open(mi_path+'datos.txt', 'r+')
b = open(mi_path+'nuevo_dato.txt', 'w+')

datos = f.read()


for(ruta, directorio, archivos) in os.walk(mi_path):
      for i in archivos:
            b.write(i)

      print "Directorio leido"      
      nuevoreg = b.read()

      if datos == nuevoreg:
                        print "NO hay un fichero nuevo"
      else:
                        print "SI hay un fichero nuevo"


f.close()
b.close()

The problem is that when I save the list I get from reading the directory in the .txt new_dato, it is saved with a lot of encoding to larger files than there are in the directory itself.

  

1.txt2.txt3.txt4.txtdatos.txtdoc 2.txt

     
                         j                             t     
         

j & f}                               t 'ç t
       _ t '

  

When doing the comparison between datos.txt and nuevo_dato.txt it always tells me that there is new data because all of that data is stored in the txt. If I make a print i to show the list of directory data before burning them in the txt it works perfectly, the problem is when you record them.

Can someone tell me what I'm doing wrong? and another question, could you store the files contained in the directory in a list instead of a txt?

Good @FJSevilla, if exact, what I want is to read the files that are inside the directory and buy them with the names that are inside data.txt and when when doing os.walk there is a new one let me know. Inside data.txt let's say I have: doc1.txt doc2.txt doc3.txt which are the same files that I have inside the directory. The problem as I comment is that doing the write in the file new_dato to dump the return of os.walk adds a lot of things besides the name of the files, but if I make a print to show the result of os.walk in the interpret, those characters do not appear anywhere and gives me print on the screen the names of the files that are inside the directory in alphabetical order. I do not know what is the reason why, if I print it shows me the list well but if I do write to .txt and then buy both fails.

    
asked by Devp Devp 13.07.2017 в 13:43
source

2 answers

1

The error you have is because you are reading after writing without downloading the buffer (you do not close or force the dump manually) and without returning the pointer to the beginning of the file. To do what you want you must do this:

for(ruta, directorio, archivos) in os.walk(mi_path):
    for i in archivos:
        b.write(i)

b.flush() # Forzamos el volcado del buffer
m.seek(0) # Colocamos el puntero al inicio del documento.

print "Directorio leido"      
nuevoreg = b.read():
    if datos == nuevoreg:
        print "NO hay un fichero nuevo"
    else:
        print "SI hay un fichero nuevo"

f.close()
b.close()

However, as I mentioned, nobody assures you that os.walk() returns in the same order always. Cesar's option is much safer. You can even use sets difference instead of lists (which allows you to identify more efficiently which are the different files) and even use pickle to serialize directly the set with the names of the files instead of a file of text.

    
answered by 13.07.2017 / 16:46
source
2

Imagine that I have content from the file datos.txt is:

$ cat datos.txt
a.txt
b.txt
c.txt
e.txt
g.txt

Imagine also that in the same folder I have the following files:

$ ls -l
-rw-rw-r-- 1 cesar cesar  4 Jul 13 09:07 a.txt
-rw-rw-r-- 1 cesar cesar  4 Jul 13 09:07 b.txt
-rw-rw-r-- 1 cesar cesar  4 Jul 13 09:07 c.txt
-rw-rw-r-- 1 cesar cesar 31 Jul 13 09:07 datos.txt
-rw-rw-r-- 1 cesar cesar  4 Jul 13 09:08 d.txt
-rw-rw-r-- 1 cesar cesar  4 Jul 13 09:08 e.txt
-rw-rw-r-- 1 cesar cesar  4 Jul 13 09:08 f.txt
-rw-rw-r-- 1 cesar cesar  4 Jul 13 09:08 g.txt

Therefore, at first glance, we can say that within datos.txt only the files d.txt and f.txt are missing. If what you want is to know what files in the directory are not contained in datos.txt you can do:

>>> a = open('datos.txt')
>>> datos = a.read().split()
>>> print datos
['a.txt', 'b.txt', 'c.txt', 'e.txt', 'g.txt']
>>> import os
>>> nuevos_archivos = []
>>> mi_path = os.path.curdir
>>> for _, _, archivos in os.walk(mi_path):
...     for archivo in archivos:
...         if archivo == 'datos.txt': 
...             continue
...         if archivo not in datos:
...             nuevos_archivos.append(archivo)
... 
>>> nuevos_archivos
['d.txt', 'f.txt']
>>> a.close()

Now that you have the files in a list, you can do several things with them. If you want to add those missing files to datos.txt you can do:

>>> a = open('datos.txt', 'a')
>>> for archivo in nuevos_archivos:
...     a.write(archivo + '\n')
... 
>>> a.close()

Let's see now what's inside the file:

$ cat datos.txt 
a.txt
b.txt
c.txt
e.txt
g.txt
d.txt
f.txt
    
answered by 13.07.2017 в 16:31