Mathias Gaunard
Automatic parallelization will never beat code optimized by experts. Experts program each type of parallelism by taking into account its specificities.
That is hyperbole. "Never" is a strong word.
An interesting point in favor of a library is also memory layout. A C++ compiler cannot change the memory layout on its own to make it more friendly to vectorize. By providing the right types and primitives to the user, he is made aware of the issues at hand and empowered with the ability to explicitly state how a given algorithm is to be vectorized.
I agree that libraries to make data shaping easier are useful!
For specialized operations like horizontal add, saturating arithmetic, etc. we will need intrinsics or functions that will be necessarily target-dependent.
The proposal suggests providing vectorized variants of all mathematical functions in the C++ standard (the Boost.SIMD library covers C99, TR1 and more). That's quite a lot of functions.
But not the special ones I mentioned.
Should all these functions be made compiler built-ins? That doesn't sound like a very scalable and extensible approach.
I dunno, we do a lot of that here.
Vector masks fundamentally change the model. They drastically affect control flow.
Some processors have had predication at the scalar level for quite some time. It hasn't drastically changed the way people program.
Scalar predication hasn't changed the way people program because compilers do the if-conversion. As it should be with vectors.
It is similar to doing two instructions in one (any instruction can also do a blend for free), and optimizing those instructions done separately into one is something that a compiler should be able to do pretty well. It doesn't sound very unlike what a compiler must do for VLIW codegen to me, but then I have little knowledge of compilers.
I have trouble seeing how one would use the SIMD library to make it easier to write predicated vector code. Can you sketch it out?
The fact that it is the library doesn't mean that the compiler shouldn't perform on vector types the same optimizations that it does on scalar ones.
Of course it will. But the library user has already made the choice of what to vectorize. Many times it will be the right choice, but not always.
While I can see the benefit of this feature for a compiler that wants to generate SIMD for arbitrary code, dedicated SIMD code will not depend on this too much that it cannot be covered by a couple of additional functions.
Predication allows much more effecient vectorization of many common idioms. A SIMD library without support for it will miss those idioms and the compiler auto-vectorizer will get better performance.
Longer vectors can also dramatically change the generated code. It is *not* simply a matter of using larger strips for stripmined loops. One often will want to vectorize different loops in a nest based on the hardware's maximum vector length.
I don't see what the problem is here. This is C++. You can write generic code for arbitrary vector lengths. It is up to the user to use generative programming techniques to make his code depend on this parameter and be portable. The library tries to make this as easy as possible.
So the user has to write multiple versions of loops nests, potentially one for each target architecture? I don't see the advantage of this approach.
A library-based short vector model like the SIMD library is very non-portable from a performance perspective.
From my experience, it is still fairly reliable. There are differences in performance, but they're mostly due to differences in the hardware capabilities at solving a particular application domain well.
Well yes, that's one of the main issues. -David