On 23/08/2018 09:16, Andrey Semashev wrote:
I think such an optimization would be useful. Note that MSVC also has intrinsics for popcount[1], although I don't think those are supported when the target CPU doesn't implement the corresponding instructions. You would have to check at compile time whether the target CPU supports it (e.g. by checking if __AVX__ is defined).
While compile-time detection is better, if you can do it (because it lets it be completely inlined); if the compile-time detection fails, you can still do runtime detection, eg. by defining something like: // header file extern int (*popcnt64)(uint64_t); // source file static bool is_popcnt_supported() { int info[4] = { 0 }; __cpuid(info, 1); return (info[2] & 0x00800000) != 0; } static int popcnt64_intrinsic(uint64_t v) { return /* _mm_popcnt_64(v) or __builtin_popcountll(v) */; } static int popcnt64_emulation(uint64_t v) { // code that calculates it with bit twiddling } static int popcnt64_auto(uint64_t v) { popcnt64 = is_popcnt_supported() ? &popcnt64_intrinsic : &popcnt64_emulation; return popcnt64(v); } int (*popcnt64)(uint64_t) = &popcnt64_auto; Repeat for other argument sizes as needed. You could probably do something fancier with C++11 guaranteed static initialisation, but this will work on all compilers.