On Fri, Jun 16, 2017, 6:08 PM Soul Studios via Boost
Time complexity is not the only consideration when reviewing sorting techniques. Smoothsort is purported to have much better time complexity across a range of scenarios, but in reality it's cache performance is rubbish. That's where quicksort variants and timsort work well. I haven't investigated spreadsort personally.
Agreed, that's why spreadsort has to limit itself to dividing into a number of buckets that will fit in L1 cache (its theoretical time complexity would be better otherwise). It works out to between slightly faster than std::sort to about 2X as fast on most data. With string or other multiple piece keys the spreadsort speedup can be spectacular, as it will traverse the full length of the key at most once during the radix portion of the sort, where comparison sorts can do so for each comparison. When I modified bzip to use spreadsort the overall compression runtime (more than just sorting) dropped 40%. Smoothsort has the same nlogn worst case as other comparison sorts, and it has the cache issues of heapsort, which makes for poor performance on average.
On 13/06/2017 11:09 p.m., Steven Ross via Boost wrote:
After reviewing the responses, I have rejected Timsort. Here are the reasons:
1. No author/maintainer responses to the questions on boost (except for forwarding some documentation links). We need a maintainer for all our source, and I'm not volunteering to maintain new functions as complex as Timsort. 2. There was no response on the Timsort bug reported during the review, nor any tests for it. 3. We need better tests. 4. Francisco is including a more efficient stable sort as part of his additions to the library, including handling special cases like Timsort handles better than stable_sort does. Francisco will maintain his code.
I'd like to note that many people THINK they are experts on sorting because they covered it in class or may have read about it in a textbook, but most do not understand hybrid sorts well. As a specific example, I saw O(N*log(N)) mentioned as the worst-case speed limit of general-case sorts, when in fact Spreadsort beats that as described in its docs:
http://www.boost.org/doc/libs/1_61_0/libs/sort/doc/html/index.html#sort.over...
Many people also don't carefully analyze for worst-case performance,
can be tricky.
With regards to future sorting library reviews, I'll get at least a one-page doc describing the algorithm for the review in advance, but the boost documentation formatting is an unreasonable request for a simple sorting function, and I don't see the value it adds for these simple reviews. It is useful for a full library, but we'll be integrating the documentation into our full library after a new function is accepted.
It isn't accurate to call it a mini-review (as described here: http://www.boost.org/community/reviews.html#Fast-Track) because it's reviewing completely new source we're talking about adding to the
which library.
That said, do you still want me to a call the upcoming one for pdqsort a "mini-review"?
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost