Re: [boost] Going forward with Boost.SIMD

24 Apr 2013

      On 24/04/13 20:00, dag@cray.com wrote:
...
All of the scalar and complex arithmetic using simple binary operators
can be easily vectorized if the compiler has knowledge about
dependencies.  That is why I suggest standardizing keywords, attributes
and/or pragmas rather than a specific parallel model provided by a
library.  The former is more general and gives the compiler more freedom
during code generation.
...
But see that's exactly the problem.  Look at the X1.  It has multiple
levels of parallelism.  So does Intel MIC and GPUs.  The compiler has to
balance multiple parallel models simultaneously.  When you hard-code
vector loops you remove some of the compiler's freedom to transform
loops and improve parallelism.
Automatic parallelization will never beat code optimized by experts. 
Experts program each type of parallelism by taking into account its 
specificities.
A one-size-fits-all model for all kinds of parallelism is nice, but 
limited; using a dedicated tool for each type of parallelism is the 
right approach for maximum performance.

While it could be argued that experts should use the lowest level API to 
reach their goals, such libraries can still make experts much more 
productive.

An interesting point in favor of a library is also memory layout. A C++ 
compiler cannot change the memory layout on its own to make it more 
friendly to vectorize. By providing the right types and primitives to 
the user, he is made aware of the issues at hand and empowered with the 
ability to explicitly state how a given algorithm is to be vectorized.
...
For specialized operations like horizontal add, saturating arithmetic,
etc. we will need intrinsics or functions that will be necessarily
target-dependent.
The proposal suggests providing vectorized variants of all mathematical 
functions in the C++ standard (the Boost.SIMD library covers C99, TR1 
and more). That's quite a lot of functions.
Should all these functions be made compiler built-ins? That doesn't 
sound like a very scalable and extensible approach.
You'll probably want to use different algorithms for the SIMD variants 
of these functions, so having the compiler auto-vectorize the scalar 
variant doesn't sound like a terrible idea either.
...
Vector masks fundamentally change the model.  They drastically affect
control flow.
Some processors have had predication at the scalar level for quite some 
time. It hasn't drastically changed the way people program.

It is similar to doing two instructions in one (any instruction can also 
do a blend for free), and optimizing those instructions done separately 
into one is something that a compiler should be able to do pretty well. 
It doesn't sound very unlike what a compiler must do for VLIW codegen to 
me, but then I have little knowledge of compilers.

The fact that it is the library doesn't mean that the compiler shouldn't 
perform on vector types the same optimizations that it does on scalar ones.

While I can see the benefit of this feature for a compiler that wants to 
generate SIMD for arbitrary code, dedicated SIMD code will not depend on 
this too much that it cannot be covered by a couple of additional functions.
...
Longer vectors can also dramatically change the generated code.  It is
*not* simply a matter of using larger strips for stripmined loops.  One
often will want to vectorize different loops in a nest based on the
hardware's maximum vector length.
I don't see what the problem is here.
This is C++. You can write generic code for arbitrary vector lengths. It 
is up to the user to use generative programming techniques to make his 
code depend on this parameter and be portable. The library tries to make 
this as easy as possible.
...
A library-based short vector model like the SIMD library is very
non-portable from a performance perspective.
From my experience, it is still fairly reliable. There are differences 
in performance, but they're mostly due to differences in the hardware 
capabilities at solving a particular application domain well.

Re: [boost] Going forward with Boost.SIMD

Mathias Gaunard