On Nov 19, 2017 14:54, "John Maddock via Boost"
I have been investigating a 15% performance regression in my C++ primesum program (https://github.com/kimwalisch/primesum/tree/256-bit) over the last 2 days.
By lots of benchmarking I was able to identify the boost multiprecision library together with -std=c++11 (or -std=gnu++11) as the culprit for the performance regression because I have also a version of the primesum program which does not use the boost multiprecision library and in this version there is no performance regression when compiling using -std=c++11.
I have tested using multiple versions of the boost multiprecision library including the latest 1.65.1. The slowdown happens on both GCC (versions: 5.4, 6.4, 7.2) and Clang (version 3.8) on x86_64 Linux. I am only using the int256_t and uint256_t types (hence cpp_int backend) and I am doing only simple integer arithmetic: +, - and *.
Is this a known issue and is there a known workaround e.g. special compiler flag? I could revert to C++98 but I really don't want to do that...
No, not known, and if anything C++11 should speed things up by enabling rvalue-references etc. If you can narrow it down some more I'll certainly investigate. Thanks for the heads up, John. --- This email has been checked for viruses by AVG. http://www.avg.com _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman /listinfo.cgi/boost OK, I'll try to narrow it down. The simplest algorithm using the int256_t type in primesum is S2_trivial. You can have a look at the algorithm here: https://github.com/kimwalisch/primesum/blob/256-bit/src/ deleglise-rivat/S2_trivial.cpp#L59 There are only 2 lines of code (62-63) using the int256_t type in this algorithm: maxint_t diff = prime_sums[pi[y]] - prime_sums[pi[xn]]; s2_trivial += prime * diff; Note that maxint_t is a typedef for int256_t. The first line does an __int128_t substraction and converts the result (impliciltly) to int256_t. The second code line does an int256_t multiplication and adds the result to the int256_t s2_trivial variable. As soon as I add -std=c++11 to the compiler flags the algorithm runs 15% slower (using Clang and GCC on Linux x86_64). Funnily, if I change the code lines to: maxint_t diff = prime_sums[pi[y]] - prime_sums[pi[xn]]; maxint_t prime2 = prime; diff *= prime2; s2_trivial += diff; This code runs already 11% faster using -std=c++11 even though it does exactly the same (and only 4% slower than without -std=c++11). Without -std=c++11 this code does not run faster. My code mixes __int128_t with int256_t a lot and one of my guesses on what causes the slowdown is that the __int128_t to int256_t conversion has become much slower (in some cases) using -std=c++11. Kim