On Fri, Jan 13, 2017, 4:05 PM Marc Glisse
On Fri, 13 Jan 2017, Steven Ross wrote:
I've been reviewing a proposed addition to Boost.Sort called pdqsort. When using a general-case partitioning algorithm, it performs comparably to std::sort in my testing (Windows and Linux). When used as a replacement for the std::sort fallback in spreadsort, using a branch-reducing optimization it is ~20% faster on ints and floats, and ~40% slower on strings. The concerning part is that any comparison operator that involves a branch may have a comparable slowdown to the string case. pdqsort is also significantly faster than std::sort on mostly-sorted data and some other (relatively common) special cases. This makes little difference when used as a fallback by spreadsort, but does make a difference when used on its own.
I see these possible things to do with pdqsort: 1) Reject it completely as too similar to std::sort, which is already highly optimized. 2) Add it as another optional library in the Boost.Sort library. 3) Add it to Boost.Sort, and only use it as a fallback for ints and floats. 4) Add it to Boost.Sort, use it as a fallback for all of spreadsort, and have pdqsort itself only use its branch-reduction optimization on ints and floats. 5) #4, except eliminate the branch-reduction optimization completely from pdqsort for simplicity.
Does anyone have a strong preference, advice, or comments on how they use Boost.Sort?
Should I run a mini-review for pdqsort?
I assume this is the same pdqsort (https://github.com/orlp/pdqsort) that was proposed for both libc++ and libstdc++ as an implementation of std::sort. For libstdc++, sadly, it was never reviewed because the reviewer turned out to be busier than he expected. I didn't follow what happened for libc++.
From my notes (https://gcc.gnu.org/ml/libstdc++/2015-10/msg00040.html), on my application where copy is cheap (one pointer) but the comparison can be a bit expensive (reading memory through several pointers), libstdc++ sort was taking 3.46s, stable_sort 3.08s, and pdqsort 2.85s.
I am surprised you found it so much slower for strings. Was it C++03 or C++11? COW strings or not? Short or long strings? Did the author have any comment on this benchmark? Now the code may have changed since I tried it in 2015, maybe it was re-tuned more for int/float (the branch-reducing stuff?) and less for more expensive types since then...
(A naive question: do people ever sort a large, plain vector<int> or vector<double>? I don't think I've ever used sort with basic types, or at least not on large vectors where performance was critical. The closest would have the int/double as one element of a pair.)
I'd rather see the various implementations of std::sort improved to the point where integrating pdqsort in boost would be useless, but I am not sure that's going to happen.
I am not sure what your benchmark involves exactly. If pdqsort is used only as a fallback for spreadsort, that's a special kind of input that might not represent what users would feed it if they called it directly. However, a 40% slowdown is a serious cause for concern... You don't say if disabling the branch-reduction option improves performance for strings, which your #4 hints at.
It does; the speed is roughly the same as std::sort as a fallback for spreadsort for both ints and strings when I revert the branch reduction optimization. The standalone (outside spreadsort) improvement for mostly-sorted data is substantial both with and without the branch optimization. I have sorted basic data types at work (it's especially useful for compression), but pairs are more common.
-- Marc Glisse
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost