Re: [boost] [sort] Timsort review result

16 Jun 2017


      On Fri, Jun 16, 2017, 6:08 PM Soul Studios via Boost <boost@lists.boost.org>
wrote:
...
Time complexity is not the only consideration when reviewing sorting
techniques. Smoothsort is purported to have much better time complexity
across a range of scenarios, but in reality it's cache performance is
rubbish. That's where quicksort variants and timsort work well. I
haven't investigated spreadsort personally.
Agreed, that's why spreadsort has to limit itself to dividing into a number
of buckets that will fit in L1 cache (its theoretical time complexity would
be better otherwise). It works out to between slightly faster than
std::sort to about 2X as fast on most data.  With string or other multiple
piece keys the spreadsort speedup can be spectacular, as it will traverse
the full length of the key at most once during the radix portion of the
sort, where comparison sorts can do so for each comparison.  When I
modified bzip to use spreadsort the overall compression runtime (more than
just sorting) dropped 40%.

Smoothsort has the same nlogn worst case as other comparison sorts, and it
has the cache issues of heapsort, which makes for poor performance on
average.
...
On 13/06/2017 11:09 p.m., Steven Ross via Boost wrote:
...
After reviewing the responses, I have rejected Timsort.  Here are the
reasons:
1. No author/maintainer responses to the questions on boost (except
for
    forwarding some documentation links).  We need a maintainer for all
our
    source, and I'm not volunteering to maintain new functions as
complex as
    Timsort.
    2. There was no response on the Timsort bug reported during the
review,
    nor any tests for it.
    3. We need better tests.
    4. Francisco is including a more efficient stable sort as part of his
    additions to the library, including handling special cases like
Timsort
    handles better than stable_sort does.  Francisco will maintain his
code.
I'd like to note that many people THINK they are experts on sorting
because
they covered it in class or may have read about it in a textbook, but
most
do not understand hybrid sorts well.  As a specific example, I saw
O(N*log(N)) mentioned as the worst-case speed limit of general-case
sorts,
when in fact Spreadsort beats that as described in its docs:
http://www.boost.org/doc/libs/1_61_0/libs/sort/doc/html/index.html#sort.over...
...
Many people also don't carefully analyze for worst-case performance,
...
can be tricky.
With regards to future sorting library reviews, I'll get at least a
one-page doc describing the algorithm for the review in advance, but the
boost documentation formatting is an unreasonable request for a simple
sorting function, and I don't see the value it adds for these simple
reviews.  It is useful for a full library, but we'll be integrating the
documentation into our full library after a new function is accepted.
It isn't accurate to call it a mini-review (as described here:
http://www.boost.org/community/reviews.html#Fast-Track)  because it's
reviewing completely new source we're talking about adding to the
which
library.
...
That said, do you still want me to a call the upcoming one for pdqsort a
"mini-review"?
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost