Hi Alexis, you wrote:
I am trying to use the boost::ublas library in my code to speed up things, but I was a bit suspicious about the results.
You've read the documentation's remarks about abstraction penalty, haven't you?
So I ran my own tests, and I found that ublas is not as fast as the C implementation, but is actually about twice slower (for my test, which are very limited). Please find the source file below. I am not very familiar with expression template. Am I doing something wrong?
No (as far as I see).
Are the results specific to this test?
In a certain sense. [snip parts of code]
typedef int value_type;
Interesting data type ;-)
#define N 14 #define K 100000000
[snip other parts of code] I've been playing with these parameters and two compilers. For GCC 3.2.1 (compiler options -NDEBUG -O3) I see the following results on my box: n = 14, k = 100000000 c t=4.5 assign prod t=6.58 n = 14*14, k = 10000000 c t=5.59 assign prod t=7.76 n = 14*14*14, k = 1000000 c t=8.89 assign prod t=12.59 n = 14*14*14*14, k = 100000 c t=70.88 assign prod t=70.11 For ICC 7.1 (again compiler options -NDEBUG -O3): n = 14, k = 100000000 c t=3.16 assign prod t=6.22 n = 14*14, k = 10000000 c t=4.49 assign prod t=6.05 n = 14*14*14, k = 1000000 c t=6.91 assign prod t=10.19 n = 14*14*14*14, k = 100000 c t=70.37 assign prod t=70.78 It looks like the inner loops of the 'C' and C++ code are identical. I'm unsure if the abstraction penalty for small sizes can be eliminated by the compiler writers or if it justifies a 'tiny vector/matrix' abstraction a la Blitz++ (resulting in a corresponding compile time abstraction penalty ;-) Best, Joerg