Le 20/03/14 21:57, Naveen a écrit :
Hi, Hi, Based on understanding the proposal, I'm just curious to know how we are gonna use the compiler optimizations when creating the parallel_xxx functions. Say any normal loop would be optimized by the compiler using any of the loop transformations like loop interchange, loop unrolling, loop fusion etc.,
let us consider a sample FORTRAN code to :
DO J = 1, 100 DO I = 1, 100 DO K = 1, 100 C(I,J) = C(I,J) + A(I,K) * B(K,J) END DO END DO END DO
which may or may not be optimized by the compiler into something like this
DO K = 1, 100 DO I = 1, 100 DO J = 1, 100 C(I,J) = C(I,J) + A(I,K) * B(K,J) END DO END DO END DO
so, when we try to introduce our parallel algorithm in to the loop containing J, it may be interchanged depending upon the compiler properties. So, should we plan for such scenarios. Could you show how would you parallelize this code? When we try to introduce our parallel functions the only way possible is to make the compiler optimizations temporarily inactive on those tasks. But, by doing so the performance of the application will be affected.
Should we really tackle this scenario or am i looking into a completely irrelevant picture?
I don't see nothing irrelevant. Vicente