Re: [boost] [Parallel Algorithms] Queries regarding the Boost.Thread / Parallel algorithms
Hi, In my previous mail, I was speaking about the compiler optimization dependencies while creating a parallel version of the following code: DO J = 1, 100 DO I = 1, 100 DO K = 1, 100 C(I,J) = C(I,J) + A(I,K) * B(K,J) END DO END DO END DO As you know, the above code is the 2D-Matrix multiplication logic. I tested the above logic using various compiler optimization levels from -o0, -o1, -o2 and -o3. There were no problems due to the compiler optimizations. I have used pthreads to convert the serial code to its parallel version. Initially, the serial version took 72sec to execute and the parallel version with 4 threads took around 16sec. Please have a look at the attachment for the complete working source code. The question now is, I have used the threads as shown in the pseudo code below: int main() { create_pthreads(assign_thread_ID, call the function); join_threads(thread_ID); destroy_threads(); } function_called_by_each_thread(thread_ID) { all_computations; } All thread documentation reaches to some sort of options like this. Is this the correct way to approach the problem for creating parallel algorithms for boost threads. Please clarify, whether there are any alternative approaches available to achieve parallelism using threads. PS: Please have a look at the code for further details Regards, *NAVEEN* | Mobile: 832-720-2393 | about.me http://about.me/naveen.namashivayam |
On 24/03/14 18:00, Naveen wrote:
Hi, In my previous mail, I was speaking about the compiler optimization dependencies while creating a parallel version of the following code:
DO J = 1, 100 DO I = 1, 100 DO K = 1, 100 C(I,J) = C(I,J) + A(I,K) * B(K,J) END DO END DO END DO
As you know, the above code is the 2D-Matrix multiplication logic.
While this is a pretty bad implementation of matrix multiplication, it's interesting because it involves multiple loops. A good implementation would parallelize regardless of the number of iterations in each loop, which are all set to 100 in your example but which might be smaller in other cases.
The question now is, I have used the threads as shown in the pseudo code below:
int main() { create_pthreads(assign_thread_ID, call the function);
join_threads(thread_ID);
destroy_threads(); }
function_called_by_each_thread(thread_ID) { all_computations; }
All thread documentation reaches to some sort of options like this. Is this the correct way to approach the problem for creating parallel algorithms for boost threads.
That's a basic skeleton for parallel_transform on an SMP machine.
Please clarify, whether there are any alternative approaches available to achieve parallelism using threads.
There are plenty. Just look at the literature on the subject.
participants (2)
-
Mathias Gaunard
-
Naveen