Upload files to S3 using Multithreading in Python

Question

Upload files to S3 using Multithreading in Python

Navigation

#1 by (2 votes)

2

I have a folder with thousands of files to upload to my S3, for now I do it like that, without problems:

for x in images_full_path:
        data = open(x,'rb')
        name_file = os.path.basename(x)

        s3.Bucket(bucket_name).put_object(Key=AWS_folder_name+name_file, Body = data)

How could I speed up the upload process by adding several concurrent threads?

python hilos aws

asked by Xavi 11.07.2017 в 16:15

source

1 answer

Android VOLLEY send json and receive json in response Basic C matrix

score 2 · Accepted Answer

It occurs to me that the easiest thing you can do is to try to use the class Pool of the module multiprocessing :

from multiprocessing import Pool

def upload_to_s3(file):
    # Acá va tu magia para subir los archivos


def main():
    images_full_path = [...] # Tus archivos
    pool = Pool()
    pool.map(upload_to_s3, images_full_path)

What you are going to do is, for each element in images_full_path apply the function upload_to_s3 . In this function the code must be to upload to S3 as you have been doing.

Pool uses a certain number of "workers" to run the job in parallel. If you do not explicitly pass the number of "workers" will use the number of CPUs.

It does not use threads, but it may be useful to begin with.