On Sunday 21 April 2013 11:34:14 Mathias Gaunard wrote:
On 19/04/13 06:55, Andrey Semashev wrote:
According to my experience, compilers are reluctant at pattern matching the intrinsics and replacing them with other intrinsics (which is a good thing). So if the user's code a*b+c*d is equivalent to two _mm_mullo_epi16/_mm_mulhi_epi16 and _mm_add_epi32 then that's what you'll get in the output instead of a single _mm_madd_epi16. Note also that _mm_madd_epi16 requires a special layout of its operands in xmm register elements, which is also a blocker for the compiler optimization.
_mm_madd_epi16 is not a vertical operation, so it's a fairly special function, and you can't expect the compiler to recognize cases where it can use it.
That's my point. Nonetheless this operation is very useful in some cases and I would like to be able to use it with Boost.SIMD. Same as many other special operations.
I think special opreations like FMA, madd, hadd/hsub, avg, min/max should be provided as functions. Also, it might be helpful to be able to convert packs to the compiler-specific types, like __m128i, and back to be able to use other more special intrinsics that are not available as functions or interoperate with inline assembler.
What I also forgot to ask is how the paper and Boost.SIMD handle overflowing and saturating integer arithmetics? I assume, the operators on packs implement overflowing operations since that's how scalar operations work. Is it possible to do saturating operations then?
The standard proposal tried to keep things simple, the library itself has quite a few more things.
So, is it possible to convert pack to __m128i & co. and back in Boost.SIMD?