Multiprocessing with threads in Python

Question

Multiprocessing with threads in Python

Navigation

#1 by (3 votes)

0

I'm doing a program to process files by thread but I can not get parallelism. In the program I have the following structure for the creation of the threads, but I work one at a time for all cases.

import threading
import time

def main():

    def imprimir(n):
        while(True):
            print ('hilo{}'.format(str(n)))
            time.sleep(0.5)
    for i in range(4):
        hilo = threading.Thread(name='hilo{}'.format(str(i)),target=imprimir(1))

if (__name__ == '__main__'):
   main()

If I omit the start() of the threads in the same way they start. How can I solve this?

python hilos

asked by Michael Alexander Quinonez 17.03.2017 в 01:04

source

1 answer

How to know if an email has been read [closed] Call document.ready from a partial view

score 3 · Accepted Answer

The problem is that you are passing the arguments of the imprimir function to each created thread. You must use the args argument for it. If you pass it in parentheses, you actually call the function in the main thread and it is this and only this that executes the function, so you do not need start , because you do not even create a son thread, the program remains eternally executing your function with infinite cycle in the main thread.

The code should look something like this:

import threading
import time

def main():

    def imprimir(n):
        while(True):
            print('{0} imprime {1}\n'.format(threading.current_thread().name, n), end = '')
            time.sleep(0.5)

    for i in range(4):
        hilo = threading.Thread(name='Hilo{}'.format(i), target=imprimir, args=(1,))
        hilo.start()

if (__name__ == '__main__'):
   main()

Note that args must be a sequence , so I do not pass n simply but a tuple of the form (n, ) , the comma is the key :).

threading.current_thread().name allows us to identify each thread by printing its name.

If you are going to process files you may be interested in using a queue (which is threading-safe) where you can place, for example, the paths of each file (or other additional parameters such as output file paths) and let the different ones threads go processing them, an example but filling the queue with integers:

import queue
import threading
import time

#Función que consume los datos de la cola
def worker(q):
    while not q.empty():
        n = q.get()
        print( '{0} imprimiendo: {1}.\n'.format(threading.current_thread().name, n), end = '')
        time.sleep(0.5)
    print('{0} terminó su trabajo.\n'.format(threading.current_thread().name), end = '')

def main():
    #Vamos a llenar la cola con algunos datos, en este caso enteros:
    q = queue.Queue()
    for n in range(1000, 1101):
        q.put(n)

    #Creamos los hilos que procesarán la cola, en este caso 5
    thread_count = 5
    threads=[]
    for i in range(thread_count):
        t = threading.Thread(target=worker, args = (q,))
        threads.append(t)
        t.start()

    #Esperamos a que terminen todos los hilos antes de terminar el programa principal
    for thread in threads:
        thread.join()

if __name__ == '__main__':
    main()

Actually (also depending on what you intend to do with the files) in these cases it is usually advisable to use processes instead of threads given the limitations that the GIL puts on concurrency in Cpython. Basically the GIL (Global Interpreter Lock) is a block at the level of interpreter that prevents the execution of multiple threads at the same time in the same Python interpreter. For a thread to be executed, it must wait until the GIL is released or is released by another thread if it was in use. This is because the memory management of the CPython interpreter itself (not of our program itself) is not thread-safe. This in practice limits the use of several simultaneous threads in multicore CPUs, to avoid it several things can be done like using other implementations of the Python interpreter (Jython or IronPython do not have this problem but they are more inefficient in other aspects than CPython), implement the tasks that need concurrency of our program in C directly since it is relatively easy to extend Python with co c ++ and allows to skip the GIL at will or use multiprocesses and non-multithreading that allow the use of several cores. Then you can find more information in the following links:

GIL- Python Wiki
Multiprocessing in Python: Global Interpreter Lock (GIL) -Gembeta
Multiprocessing in Python: Dodging the GIL-Gembeta

A simple example using several processes and queues could be:

import multiprocessing
import time


#Función que consume los datos de la cola
def worker(q):
    while not q.empty():
        n = q.get()
        print( '{0} imprimiendo: {1}.\n'.format(multiprocessing.current_process().name, n), end = '')
        time.sleep(0.5)
    print('{0} terminó su trabajo.\n'.format(multiprocessing.current_process().name), end = '')

def main():
    #Vamos a llenar la cola con algunos datos, en este caso enteros:
    q = multiprocessing.Queue()
    for n in range(1000, 1101):
        q.put(n)

    #Creamos los procesos
    process_count = 5
    processes=[]
    for i in range(process_count):
        p = multiprocessing.Process(target=worker, args = (q,))
        processes.append(p)
        p.start()


    #Esperamos a que todos los procesos terminen antes de terminar el programa principal
    for process in processes:
         process.join()

if __name__ == '__main__':
    main()

In the example with multiprocesses if you use the IDLE you will probably not see anything when printing for how the IDLE implements the stdout (the processes do their work). If this is your case, execute the script by calling the interpreter in the console (Windows CMD).

There are many other ways to implement what you want to use multiprocessing.Pool It will depend to a large extent on what you want to do; read several files at the same time and process them separately with separate outputs, read several files at the same time and process them separately but with a single output, process a very large file each time using several threads to process it, etc. It's a matter of trying to see which options give you the best performance for your particular case.