On 29/05/13 12:46, Aditya Avinash wrote:
On Wed, May 29, 2013 at 4:10 PM, Mathias Gaunard < mathias.gaunard@ens-lyon.org> wrote:
Ok, in that case, you need to first study how uBlas works.
For example if you write something along the lines of
a = trans(b + c) * d;
AFAIK what uBlas does is something like
for(size_t i=0; i!=sz.height; ++i) for(size_t j=0; j!=sz.width; ++j) a[i][j] = (b[j][i] + c[j][i]) * d[i][j];
What you need to do is change the loop structure and modify the evaluation of all nodes involved to support SIMD.
Of course trans is going to be a problem. Thankfully uBlas doesn't have that many functions, so trans and herm are the only functions that exhibit that issue.
Should i write SIMD code for the algorithm. Or, as there is no such function in uBLAS, do you want me to develop CPU code (function)??
There is no algorithm here. It's just the evaluation of a uBlas matrix expression template.