[boost] Adding my sorting algorithm, ska_sort to boost

13 Oct 2019

      Hi,
a couple of years ago I wrote a generalized version of radix sort, called ska_sort. I gave a talk about it at CppNow, available here:
https://www.youtube.com/watch?v=zqs87a_7zxw

I have a new version of that algorithm that is both faster and more general. I can now sort almost anything. For example when I gave the talk I couldn't sort a std::vector<std::list<int>> because std::list doesn't have a random access interface, but now that works. (and it's faster than sorting the same thing with std::sort)

I've also made it much easier to extend the algorithm with your own types. It was always easy if your type isn't too complicated, but now even weird types like std::variant are pretty easy to add. In my old version I had to go deep into the guts to add support for std::variant, now it's easy enough to do that that I'll add it to the documentation.

I'm thinking of releasing the new version in boost, mainly so that it will be used more widely and so that it can be kept working in the long term.

Since this is a long weekend in the US, I will try to as much of the work as possible this weekend. I'm currently trying to figure out how the boost documentation works (I saw that there are quickbook files in Boost.Sort, but how do I run the build for it?) and I've already made good progress on cleaning up my code and making it use the boost::sort namespace.

Before I go any further I wanted to ask though:

Does boost want my sorting algorithm? This would essentially deprecate spreadsort since this one is faster and more general and easier to use. To convince you that this is worth adding, I ran the benchmark that is included in the Boost.Sort source code.
On random numbers here is how long all the algorithms took:
std::sort: 9.04
pdqsort: 4.65
std::stable_sort: 11.57
spinsort: 5.42
flat_stable_sort: 7.26
spreadsort: 4.41
ska_sort: 2.00

There are a lot more bencmarks in the Boost.Sort code, and in general the results are this: If the data is already sorted or mostly sorted, other algorithms are faster, but if there are no easy patterns, ska_sort is always fastest. (I was going to include the entire table in this email, but the formatting breaks. I can attach all the results separately if anyone is interested)

The Boost.Sort code also has benchmarks for sorting "objects" and "strings" and the results are similar there: On random data ska_sort is always fastest.

I think this new version would be great in boost. It's not every day that a new sorting algorithm comes along that's literally twice as fast as what we had before. If we want this in boost, what would the next steps be? From the "submission process" document it sounds like there will be several stages of "Refinement" for the code. (or should this go through the Boost Library Incubator?) After I figure out how the documentation works I can upload a first version of this to github. (I want to get the documentation working so that we can talk about the interface)

Let me know what you think,

Malte

-- 
  Malte Skarupke
  malteskarupke@fastmail.fm