[dynamic_bitset] Compiler intrinsics to accelerate. whos agree?
I noted detail for dynamic_bitset<> is not using the far old ASM instruction BSF which could accelerate find_first() and find_next() methodsI checked asm generated with current table based implementation.On x64 and x86 BSF and BSR are available long time ago.Population count can be implemented with LZCNT for performance oriented ways to build.Core i7 with AVX2 enable new wide block bit manipulation but it needs to code with specific GNU g++ intrinsics and MSVC intrinsics. Someone agree? Jairo.
On Wed, Jul 24, 2013 at 9:17 PM, Jairo Andres Velasco Romero
I noted detail for dynamic_bitset<> is not using the far old ASM instruction BSF which could accelerate find_first() and find_next() methodsI checked asm generated with current table based implementation.On x64 and x86 BSF and BSR are available long time ago.Population count can be implemented with LZCNT for performance oriented ways to build.Core i7 with AVX2 enable new wide block bit manipulation but it needs to code with specific GNU g++ intrinsics and MSVC intrinsics. Someone agree?
Optimization is always welcome. If you have a patch for these optimizations, you can create a Trac ticket and attach it there (and post a reference to the ticket in this list so that it gets attention). Library maintainers will have a look at it eventually.
participants (2)
-
Andrey Semashev
-
Jairo Andres Velasco Romero