That is one of the core purposes of the GSOC project. To provide fast
algorithms especially for items like matrix-matrix multiplications and
not to optimize the whole infrastructure.
Regarding the simple cases you mean that on your compiler uBLAS is
slower for example from Eigen on this piece of code?
#include <iostream>
#include <chrono>
#include
#include
using boost::numeric::ublas::noalias;
std::chrono::high_resolution_clock::time_point now() {
return std::chrono::high_resolution_clock::now();
}
double duration_since( const
std::chrono::high_resolution_clock::time_point &since) {
return std::chrono::duration_caststd::chrono::microseconds(now()
- since).count();
}
typedef double value_type;
typedef boost::numeric::ublas::matrix ublas_matrix_type;
typedef Eigen::Matrix
eigen_matrix_type;
#define SIZE 200
#define ITERATIONS 3000
int main() {
eigen_matrix_type EA(SIZE,SIZE), EB(SIZE,SIZE), EC(SIZE,SIZE),
ED(SIZE,SIZE);
ublas_matrix_type UA(SIZE,SIZE), UB(SIZE,SIZE), UC(SIZE,SIZE),
UD(SIZE,SIZE);
for( auto i=0; i!=SIZE; i++) for( auto j=0; j!=SIZE; j++){
EB(i,j)=i+3*j; EC(i,j)=i+5*j+2; ED(i,j)=2*i+3*j;
UB(i,j)=i+3*j; UC(i,j)=i+5*j+2; UD(i,j)=2*i+3*j;
}
auto start = now();
for (auto i=0; i!=ITERATIONS; i++) EA.noalias() += 2*EB+3*(EC+ED);
auto dur = (double)duration_since(start)/1000;
std::cout << EA(SIZE-1,SIZE-1) << " Duration EIGEN: " << dur << " msec\n";
start = now();
for (auto i=0; i!=ITERATIONS; i++) noalias(UA) += 2*UB+3*(UC+UD);
dur = (double)duration_since(start)/1000;
std::cout << UA(SIZE-1,SIZE-1) << " Duration uBLAS: " << dur << " msec\n";
return 0;
}
$ g++ -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.7/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro
4.7.2-2ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-4.7/README.Bugs
--enable-languages=c,c++,go,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-4.7 --enable-shared --enable-linker-build-id
--with-system-zlib --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.7
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--enable-gnu-unique-object --enable-plugin --enable-objc-gc
--disable-werror --with-arch-32=i686 --with-tune=generic
--enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
$ g++ -DNDEBUG -O3 -std=c++0x main.cpp -o benchmarks
$ ./benchmarks
2.4495e+07 Duration EIGEN: 160.901 msec
2.4495e+07 Duration uBLAS: 160.86 msec
$ ./benchmarks
2.4495e+07 Duration EIGEN: 165.348 msec
2.4495e+07 Duration uBLAS: 168.003 msec
./benchmarks
2.4495e+07 Duration EIGEN: 161.826 msec
2.4495e+07 Duration uBLAS: 160.674 msec
Best regards,
Nasos
On 05/29/2013 09:59 AM, Mathias Gaunard wrote:
On 29/05/13 15:00, Nasos Iliopoulos wrote:
We are also seeking ways of making the uBLAS expression templates more
transparent to the compiler so that auto-vectorization can kick in -
which it does in certain cases and provides a very nice performance
boost on par with explicitly vectorized libraries.
As a matter of fact I am surprised by the progress of the compilers
auto-vectorization facilities the last few years, that make me -doubt-
the need for explicit vectorization any more. The GSOC project will make
it clear for us. An added benefit on relying on compiler is that future
vector instructions come for free. A disadvantage is of course the
non-guarantee that auto-vectorization will work but I find this rarely
the case.
Yet according to a variety of benchmarks, performance of uBLAS is very
bad when compared to other similar libraries (Eigen, Armadillo,
Blitz++, Blaze, or even our own library NT2) even for simple cases and
with aggressive optimization settings.
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost