Upload files to S3 using Multithreading in Python


I have a folder with thousands of files to upload to my S3, for now I do it like that, without problems:

for x in images_full_path:
        data = open(x,'rb')
        name_file = os.path.basename(x)

        s3.Bucket(bucket_name).put_object(Key=AWS_folder_name+name_file, Body = data)

How could I speed up the upload process by adding several concurrent threads?

asked by Xavi 11.07.2017 в 18:15

1 answer


It occurs to me that the easiest thing you can do is to try to use the class Pool of the module multiprocessing :

from multiprocessing import Pool

def upload_to_s3(file):
    # Acá va tu magia para subir los archivos

def main():
    images_full_path = [...] # Tus archivos
    pool = Pool()
    pool.map(upload_to_s3, images_full_path)

What you are going to do is, for each element in images_full_path apply the function upload_to_s3 . In this function the code must be to upload to S3 as you have been doing.

Pool uses a certain number of "workers" to run the job in parallel. If you do not explicitly pass the number of "workers" will use the number of CPUs.

It does not use threads, but it may be useful to begin with.

answered by 11.07.2017 / 18:31