Re: [boost] [Parallel Algorithms] Queries regarding the Boost.Thread / Parallel algorithms
Hi, Based on understanding the proposal, I'm just curious to know how we are gonna use the compiler optimizations when creating the parallel_xxx functions. Say any normal loop would be optimized by the compiler using any of the loop transformations like loop interchange, loop unrolling, loop fusion etc., let us consider a sample FORTRAN code to : DO J = 1, 100 DO I = 1, 100 DO K = 1, 100 C(I,J) = C(I,J) + A(I,K) * B(K,J) END DO END DO END DO which may or may not be optimized by the compiler into something like this DO K = 1, 100 DO I = 1, 100 DO J = 1, 100 C(I,J) = C(I,J) + A(I,K) * B(K,J) END DO END DO END DO so, when we try to introduce our parallel algorithm in to the loop containing J, it may be interchanged depending upon the compiler properties. So, should we plan for such scenarios. When we try to introduce our parallel functions the only way possible is to make the compiler optimizations temporarily inactive on those tasks. But, by doing so the performance of the application will be affected. Should we really tackle this scenario or am i looking into a completely irrelevant picture? Regards, *NAVEEN* | Mobile: 832-720-2393 | about.me http://about.me/naveen.namashivayam |
Le 20/03/14 21:57, Naveen a écrit :
Hi, Hi, Based on understanding the proposal, I'm just curious to know how we are gonna use the compiler optimizations when creating the parallel_xxx functions. Say any normal loop would be optimized by the compiler using any of the loop transformations like loop interchange, loop unrolling, loop fusion etc.,
let us consider a sample FORTRAN code to :
DO J = 1, 100 DO I = 1, 100 DO K = 1, 100 C(I,J) = C(I,J) + A(I,K) * B(K,J) END DO END DO END DO
which may or may not be optimized by the compiler into something like this
DO K = 1, 100 DO I = 1, 100 DO J = 1, 100 C(I,J) = C(I,J) + A(I,K) * B(K,J) END DO END DO END DO
so, when we try to introduce our parallel algorithm in to the loop containing J, it may be interchanged depending upon the compiler properties. So, should we plan for such scenarios. Could you show how would you parallelize this code? When we try to introduce our parallel functions the only way possible is to make the compiler optimizations temporarily inactive on those tasks. But, by doing so the performance of the application will be affected.
Should we really tackle this scenario or am i looking into a completely irrelevant picture?
I don't see nothing irrelevant. Vicente
On 20/03/14 21:57, Naveen wrote:
Hi,
Based on understanding the proposal, I'm just curious to know how we are gonna use the compiler optimizations when creating the parallel_xxx functions. Say any normal loop would be optimized by the compiler using any of the loop transformations like loop interchange, loop unrolling, loop fusion etc.,
let us consider a sample FORTRAN code to :
DO J = 1, 100 DO I = 1, 100 DO K = 1, 100 C(I,J) = C(I,J) + A(I,K) * B(K,J) END DO END DO END DO
which may or may not be optimized by the compiler into something like this
DO K = 1, 100 DO I = 1, 100 DO J = 1, 100 C(I,J) = C(I,J) + A(I,K) * B(K,J) END DO END DO END DO
That's unlikely to happen in C++.
so, when we try to introduce our parallel algorithm in to the loop containing J, it may be interchanged depending upon the compiler properties. So, should we plan for such scenarios.
When we try to introduce our parallel functions the only way possible is to make the compiler optimizations temporarily inactive on those tasks. But, by doing so the performance of the application will be affected.
There is no need. Compiler optimizations are not allowed to change program behaviour, regardless of what you do. Introducing parallelism on any of the loops would definitely prevent re-ordering.
participants (3)
-
Mathias Gaunard
-
Naveen
-
Vicente J. Botet Escriba