Differences between fine grain and coarse grain


I have read numerous documentations about the differences between fine grain and coarse grain parallelism, but I do not get to understand it very well, here is an example of what I have seen:


"An application shows fine grain parallelism if its subtasks should be reported many times per second, coarse grain parallelism is considered if they are not communicated many times per second (...)" Source: Wikipedia.

When implementing, for example, a Matrix x Vector multiplication, how does a fine grain and a coarse grain implementation differ?

I have already made a fine-grained solution creating a thread for each row of the matrix and then operate with it, but if I now want to make a coarse-grained solution, how would I have to implement it? ?

What I have thought has been, in my case, using the Subramanian equation with Coef. lock 0 for example to get the number of threads needed and then divide the dimension that has the matrix between the number of threads to launch a thread by block and not by rows as in the fine grain.

Let's see if I can find out once and for all how each parallelism works.

asked by Repikas 21.11.2016 в 19:06

3 answers


The type of parallelism that you implement in a system is an architectural decision.

Imagine for example a system that calculates reflections, lights and shadows for a 3D object and then applies them to the basic view. Assuming that the three components can be calculated independent of each other and there are 3 or more cores available for parallel calculations, it makes sense to run 3 threads, one for reflections, one for lights and one for shadows, applying them to the basic view after they finish the parallel threads. That would be a use of coarse grain.

grano grueso

                           /               \ 
                          /                 \ 
---(main)---(crear hilos)+-----(luces)-------+(resultado)---(juntar capas)
                          \                 /
                           \               /

In another example, you want to implement a distributed system as a chat server via TCP. As incoming messages are events that occur outside the control of the workflow of the server and for technical reasons you have to run the InputStream and OutputStream on different threads. The best function of the system is achieved if the distribution of messages works on tasks as short as possible. A paradigm could be used as a Scheduler to broadcast messages on recyclable worker threads. That would be a use of fine yarn.

grano fino

               /            \
             /                \
             \   \              \     /    \
              /  \ \          /   \ /   \   \
answered by 06.01.2017 / 18:55

What I understand with fine grain is when the threads operate so that one expects the execution of the other.

Example the second thread should wait for the first thread to finish or release the critical zone, so that the same resource is shared, either the total of the sums, mainly using functions wait (), notify (), join (), etc.

As in the example on the next page.


And the case of coarse grain is when the thread acts independently of what the final result is, without sharing a lot of information between the threads.

So roughly, fine grain is mainly for things of little work to prevent a blockage, and coarse grain is the accumulation of fine grains that allow it to function independently.

I relied on this answer. link

answered by 24.11.2016 в 01:29

The granularity of the parallel processes is relative. In general terms, it can be said that a low granularity is at the task level and a high granularity is at the data level and is based on a simple and repetitive structure. For example, problems of numerical calculation, such as the finite element method (MEF in Spanish or FEM in English) is a general numerical method for the approximation of solutions of very complex partial differential equations used in various problems of engineering and physics, and that can be solved with a very fine granularity, limited by the availability of hardware. When the situation is not so structured, a task can be divided into subtopics, with low granularity. For example, a transactional system can serve clients in a parallel or concurrent manner.

answered by 25.03.2017 в 08:13