On Wed, May 29, 2013 at 4:10 PM, Mathias Gaunard < mathias.gaunard@ens-lyon.org> wrote:
Ok, in that case, you need to first study how uBlas works.
For example if you write something along the lines of
a = trans(b + c) * d;
AFAIK what uBlas does is something like
for(size_t i=0; i!=sz.height; ++i) for(size_t j=0; j!=sz.width; ++j) a[i][j] = (b[j][i] + c[j][i]) * d[i][j];
What you need to do is change the loop structure and modify the evaluation of all nodes involved to support SIMD.
Of course trans is going to be a problem. Thankfully uBlas doesn't have that many functions, so trans and herm are the only functions that exhibit that issue.
Should i write SIMD code for the algorithm. Or, as there is no such function in uBLAS, do you want me to develop CPU code (function)?? -- Aditya Avinash Atluri