To be able to continue reading a file in the line where it was left you need to reposition the cursor in that place.
You can do what you propose, save the line number in a text file, serializing the variable with pickle
or even in the database itself.
Afterwards you should go back through the file line by line until you reach the desired line, something like:
datos = 'archivo.csv'
# Cargamos la última linea leida
ultima = 991
f = open(datos, 'r')
# Recorremos las lineas hasta posicionar el cursor en la ultima leida
for n, _ in enumerate(f):
if n == ultima:
break
#Leemos las lineas que queramos y vamos aumentando el contador
for _ in range(100):
print(linea)
ultima += 1
f.close()
#Guardamos la última linea leida para otra vez
Another option is to avoid going through the file again by positioning the cursor in the place where we left it. It is important to make sure that the file is never modified between readings , if a byte is added or deleted we will obtain unexpected results (as if we added lines in the previous example). For this we will use the methods tell
(obtain the position of the cursor) and seek
(to position the cursor where we want):
datos = 'archivo.csv'
# Cargamos la última posición del cursor
cursor = 1000
f = open(datos, 'r')
# Recorremos las lineas hasta posicionar el cursor en la ultima leida
f.seek(cursor)
#Leemos las lineas que queramos y vamos aumentando el contador
for _ in range(100):
print(linea)
ultima += 1
f.close()
cursor = f.tell()
#Guardamos la variable cursor para reanudar en otro momento
An implementation of this last idea using pickle
to serialize the data can be:
import os
import pickle
class Reader:
def __init__(self, ruta):
self.ruta = ruta
self.archivo = open(ruta)
self.cursor = 0
def get_line(self):
line = self.archivo.readline()
self.cursor = self.archivo.tell()
return line
def restart(self):
self.cursor = 0
self.archivo.seek(0)
def __getstate__(self):
new_dict = self.__dict__.copy()
del new_dict['archivo']
return new_dict
def __setstate__(self, dict):
archivo = open(dict['ruta'])
cursor = dict['cursor']
archivo.seek(cursor)
self.__dict__.update(dict)
self.archivo = archivo
class TextReader:
def __init__(self, ruta):
self.ruta = os.path.abspath(ruta)
self.temp = os.path.splitext(self.ruta)[0]+ '.temp'
try:
with open(self.temp, 'rb') as dat:
self.reader = pickle.load(dat)
except:
print('fallo')
self.reader = Reader(self.ruta)
def save(self):
pickle.dump(self.reader, open(self.temp, 'wb'))
def get_lines(self, n):
#Retorna un generador con el numero de lineas especificadas si estan disponibles
for _ in range(n):
line = self.reader.get_line()
if line:
yield line
else:
break
self.save()
def readlines(self):
#Retorna un generador con todas las líneas hasta el final del archivo
while True:
line = self.reader.get_line()
if line:
yield line
else:
break
self.save()
def restart(self):
#Reinicia el cursor al inicio del documento
self.reader.restart()
Use:
#Instanciamos pasandole la ruta del archivo a leer
f = TextReader('archivo.txt')
#Leemos las lineas que queramos y salimos de la aplicacion
for line in f.get_lines(100):
print(line)
Now we must have created a file with the name of our file but with extension .temp
which is nothing more than an inactivity of Reader
serialized with pickle.
At any other time we can reread the file where we left off:
#Instanciamos de nuevo pasandole la ruta del archivo a leer
f = TextReader('archivo.txt')
#Leemos las lineas que queramos
for line in f.get_lines(100):
print(line)
In this case 100 lines will be read but from where we left it the first time. We can use f.restart()
to reread the document from the beginning (or simply delete the .temp file)
It's just an idea of how to use the cursor next to pickle to resume reading a file, it should be optimized and adjusted to your specific case to make it more efficient. And remember, the file should not be modified under any circumstances while trying to resume a reading where it was left.