On 05/06/2015 05:48 AM, Niall Douglas wrote:
One x86 specific trick is to reinterpret SSE2 registers as integers for the bit checks, that way you don't force FP values back into memory and reload into GP registers every single operation. Performance might actually be tolerable. I would suspect you'll need to drop into assembler for that though, and MSVC doesn't permit inline assembler in x64. I'd also loop David Bellot into this and ask what he thinks. Niall
You should be able to implement all of the SSE operations you need using intrinsics, which are well-supported on all recent x86 compilers. Granted, you don't get direct control over whether values get spilled from registers back to memory (as the compiler still maintains control over that), but it's a lot easier to implement than inline assembly (especially with MSVC as a requirement). Intel has a great online reference of all intrinsics here: https://software.intel.com/sites/landingpage/IntrinsicsGuide/ While it says that the list is for Intel C++, in practice, gcc/clang/MSVC are almost fully compatible with Intel's set of SSE/AVX/AVX2 intrinsics (and probably AVX-512, which is coming soon to real hardware). Jason