Adding my sorting algorithm, ska_sort, to boost, attempt 2

Malte Skarupke

22 Jan 2020 22 Jan '20

5:23 p.m.

Hi everyone, I started a discussion about this in October, but at the time the maintainer of the sort library was busy, then I was busy, so now it's January... I wrote a generalized version of radix sort a couple years ago and gave a talk about it here: https://www.youtube.com/watch?v=zqs87a_7zxw I would like to get a new version of that sorting algorithm into boost. It's faster than what I showed several years ago. It's also even more general and has a new interface that should be easier to extend. I uploaded the code here: https://github.com/skarupke/sort/tree/master The documentation is also already updated with new benchmark results. Let me know what's required to get this into boost. Would you like this in the form of a pull request? One of the big things that probably needs review is the interface, since radix sorting works differently than comparison based sorting. For most normal uses you just need to provide a sort key, but more complicated cases can get more complicated. I tried very hard to make everything easy, but I'd also be curious if others can come up with better ways of doing the interface. Thanks, Malte -- Malte Skarupke malteskarupke@fastmail.fm

Show replies by date

Glen Fernandes

22 Jan 22 Jan

5:44 p.m.

On Wednesday, January 22, 2020, Malte Skarupke wrote:

...

I started a discussion about this in October, but at the time the maintainer of the sort library was busy, then I was busy, so now it's January...

I wrote a generalized version of radix sort a couple years ago and gave a talk about it here: https://www.youtube.com/watch?v=zqs87a_7zxw

I would like to get a new version of that sorting algorithm into boost.

Let me know what's required to get this into boost.

Try e-mailing the maintainer of Boost.Sort again. If he does not approve, you don't want to spend your time working on a pull request. Glen

degski

23 Jan 23 Jan

1:13 p.m.

On Wed, 22 Jan 2020 at 11:44, Glen Fernandes via Boost < boost@lists.boost.org> wrote:

...

If he does not approve, you don't want to spend your time working on a pull request.

Then some other way has to be found. I did not try V2 (faster, says author), but V1, what I do use, gives any sort a run for it's money, it's a no-brainer. On the other hand, personally I'd rather see the library as stand alone (V2). V1 is/was a 1 file header, and so easy to integrate. degski -- @realdegski https://brave.com/google-gdpr-workaround/ "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey

Glen Fernandes

1:24 p.m.

On Thu, Jan 23, 2020 at 8:13 AM degski wrote:

...

Then some other way has to be found.

There are other ways that don't involve Boost.Sort. One way is: https://www.boost.org/community/reviews.html

...

On the other hand, personally I'd rather see the library as stand alone (V2). V1 is/was a 1 file header, and so easy to integrate.

You could ask Matle to provide (on his GitHub) a single header ska_sort.hpp (that has no dependencies besides the C++ standard library). Glen

Hans Dembinski

1:58 p.m.

...

On 22. Jan 2020, at 18:23, Malte Skarupke via Boost <boost@lists.boost.org> wrote:

Let me know what's required to get this into boost. Would you like this in the form of a pull request?

One of the big things that probably needs review is the interface, since radix sorting works differently than comparison based sorting. For most normal uses you just need to provide a sort key, but more complicated cases can get more complicated. I tried very hard to make everything easy, but I'd also be curious if others can come up with better ways of doing the interface.

I am not an expert on Boost.Sort and only had a quick look at your changes, but: If I understood correctly, your new algorithm increases the requirement for Boost.Sort to C++17 from C++11. I don't see a good reason for doing that. You can replace if constexpr and the generic lambdas in your implementation with other code constructs that work on C++11. Regarding the interface, shouldn't it follow the example of spreadsort? Regarding having an extra single-header version: that seems overkill to me. Boost.Sort already claims to have no dependencies on other Boost libraries and is header-only, which should make it extremely easy to include in any project. "These algorithms do not use any other library or utility. The parallel algorithms need a C++11 compliant compiler."

Mike

2:18 p.m.

...

Regarding having an extra single-header version: that seems overkill to me. Boost.Sort already claims to have no dependencies on other Boost libraries and is header-only, which should make it extremely easy to include in any project.

"These algorithms do not use any other library or utility. The parallel algorithms need a C++11 compliant compiler."

FYI: Apparently that statement seems to be not quite up to date: According to boostdep, sort depends on config, core, range, serialization and static_assert. However, that seems to be restricted to the spreadsort headers and transitively to the <boost/sort/sort.hpp> header (pdqsort also needs type traits) Best Mike

Malte Skarupke

6:34 p.m.

...

If I understood correctly, your new algorithm increases the requirement for Boost.Sort to C++17 from C++11. I don't see a good reason for doing that. You can replace if constexpr and the generic lambdas in your implementation with other code constructs that work on C++11.

It only requires C++17 if you want to use ska_sort. The other algorithms remain at C++11. I agree that this is a little unfortunate, but when I briefly tried to get rid of "if constexpr" in the code, it turned out to be a giant pain. It would also probably make the code compile much more slowly. If you are trying to sort nested types, like vector<vector<pair<string, int>>>, compile time can explode without some of the short circuiting that I'm doing using if constexpr. (I can explain why it explodes if you're interested, but for now I'll just say that it works different from comparison based sorting and you have to instantiate new code at every level of depth) I know I can do the same short circuiting without if constexpr if I put in enough work, but I think the code quality would suffer a lot, making the code much harder to read and maintain. (not that it's simple as is, but it would go from "hard to read" to "impossible to read")

...

Regarding the interface, shouldn't it follow the example of spreadsort?

The main benefit of this over spreadsort is that it's more generic. The spreadsort interface is hard to generalize to other types. For example it allows you to provide your own shift operator. But how do you provide a shift operator when trying to sort a pair<string, int>? When you get to the end of the string, you'd have to start providing bits from the int, but that's going to be annoyingly complicated to write. (especially since I'm not sure if spreadsort works on byte boundaries) It gets even more complicated when trying to sort a variant<string, int>. Because now the bits depend on which value of the variant is active, but you don't want to have to check that over and over again. The interface was very important for making this more general. When it comes to this kind of decisions, I have to make as many of them invisible for the user as possible, because only the internals of the sorting algorithm can know which bits to sort on next. About providing a single header: That is definitely what I would do as a standalone library, but I think in boost it's OK to split this into multiple headers. Users can include boost/sort/ska_sort/ska_sort.hpp which includes all necessary headers. So usually they don't care that it's multiple headers internally. And if you want slightly faster compile time, you can just include ska_sort_base.hpp without the headers that provide tuple or variant implementations. (this is currently not part of the documentation because I'm not sure if I want to provide that forever) So why make this part of boost instead of a standalone header? Basically I like having the assurance that it'll be kept working for a long time. I'm willing to do that and to put future work into this, but having it in boost ensures that that continues to happen even after I can no longer provide support. -- Malte Skarupke malteskarupke@fastmail.fm

Francisco José Tapia

28 Jan 28 Jan

5:32 p.m.

Hello, The first thing is to apologize for the delay. I have medical problems in my family, and all my free time is gone. I didn't remember that in February 2019 we evaluated this same algorithm. In the message that Steven wrote, he mentioned a list of major deficiencies in the algorithm. *It looks like ska_sort is a radix sort designed mostly for the best-case, and that if you feed it the alrbreaker or binaryalrbreaker distributions it will perform poorly (though not as poorly as radix implementations with no fallback or that fallback to insertionsort).* *There might be some counting and swapping optimizations here to copy into spreadsort (I see block operations), though those might not work if the items being sorted are references or pointers.* *Mostly-sorted data is a common case to take seriously, and it took only modest tweaking of spreadsort to do well there.* *Based on Francisco's numbers, I should retune spreadsort to take advantage of the higher speed of pdqsort vs std::sort, at least if I want to optimize for 64-bit integer sorting on modern processors.* *https://www.boost.org/doc/libssort/spreadsort/* <https://www.boost.org/doc/libs/1_67_0/boost/sort/spreadsort/detail/constants.hpp> *A higher log_min_split_count (especially) and log_finishing_count is probably appropriate.* Every so often we get requests for evaluation of sorting algorithms, many of which are "world's fastest algorithm". We try to evaluate such requests. I think everyone deserves respect for their work. But so do we who spend our time doing the evaluation. I did the evaluation of the algorithm, and I was surprised, because the recommendations we made in February, have not been followed. So the tests I've done, have been to corroborate what we already said in February. In other words, repeat work. The comments I have to make to the algorithm are as follows: 1- The algorithm needs C++17. This is a big problem because the library is made to be compatible with C++11. In many companies and workplaces, the compilers that are being used are old, at most compatible with C++11. In the same way that for production you don't usually use the latest version of the Operating System and SW, you don't usually use the latest version of the compiler either. It is not possible to pass a part of the library to C++17 and another part to remain with C++11. Passing it all to C++17 means leaving out many people and many companies, and that goes against the spirit of the library which is to reach as many users as possible. 2.- The algorithm has not implemented worst-case control, nor when all elements are the same or are ordered, as it was indicated in February. 3.- The algorithm is incomplete. It is to be expected from the algorithm that it orders, not only integers but also signed and unsigned integers, 32 64 and 128-bit floating-point numbers and strings. 4.- The only argument of the algorithm is its speed. About this, I have several comments a) When the implementation incorporates worst-case control of repeated and equal elements, the speed may change b) The speed of an algorithm is a factor to be taken into account, but it is not the only one. For example, the Timsort algorithm is slower than any other algorithm with random data. But it is extremely fast when the data are almost ordered, a situation that occurs with much more frequency than we imagine. This algorithm is highly valued by programmers and companies, and even the Java language uses it for its sort function. Ska_sort is very fast with the random data, but it is quite mediocre when the data are almost ordered. 5) Having a hybrid algorithm like Spreadsort, having another hybrid algorithm is not very interesting, because with small data sets, the time differences, if any, will be small, and with large data sets, parallel algorithms are used, which provide much more speed than any single-stranded algorithm In addition to all this, I would like to express my discomfort at having to waste my time, which, as I said before, is very scarce, in repeating a test, which does not contribute anything, because it has not followed any of our recommendations. Francisco Tapia El jue., 23 ene. 2020 a las 19:36, Malte Skarupke via Boost (< boost@lists.boost.org>) escribió:

...

...
If I understood correctly, your new algorithm increases the requirement for Boost.Sort to C++17 from C++11. I don't see a good reason for doing that. You can replace if constexpr and the generic lambdas in your implementation with other code constructs that work on C++11.

It only requires C++17 if you want to use ska_sort. The other algorithms remain at C++11. I agree that this is a little unfortunate, but when I briefly tried to get rid of "if constexpr" in the code, it turned out to be a giant pain. It would also probably make the code compile much more slowly. If you are trying to sort nested types, like vector<vector<pair<string, int>>>, compile time can explode without some of the short circuiting that I'm doing using if constexpr. (I can explain why it explodes if you're interested, but for now I'll just say that it works different from comparison based sorting and you have to instantiate new code at every level of depth) I know I can do the same short circuiting without if constexpr if I put in enough work, but I think the code quality would suffer a lot, making the code much harder to read and maintain. (not that it's simple as is, but it would go from "hard to read" to "impossible to read")

...
Regarding the interface, shouldn't it follow the example of spreadsort?

The main benefit of this over spreadsort is that it's more generic. The spreadsort interface is hard to generalize to other types. For example it allows you to provide your own shift operator. But how do you provide a shift operator when trying to sort a pair<string, int>? When you get to the end of the string, you'd have to start providing bits from the int, but that's going to be annoyingly complicated to write. (especially since I'm not sure if spreadsort works on byte boundaries)

It gets even more complicated when trying to sort a variant<string, int>. Because now the bits depend on which value of the variant is active, but you don't want to have to check that over and over again.

The interface was very important for making this more general. When it comes to this kind of decisions, I have to make as many of them invisible for the user as possible, because only the internals of the sorting algorithm can know which bits to sort on next.

About providing a single header: That is definitely what I would do as a standalone library, but I think in boost it's OK to split this into multiple headers. Users can include boost/sort/ska_sort/ska_sort.hpp which includes all necessary headers. So usually they don't care that it's multiple headers internally. And if you want slightly faster compile time, you can just include ska_sort_base.hpp without the headers that provide tuple or variant implementations. (this is currently not part of the documentation because I'm not sure if I want to provide that forever)

So why make this part of boost instead of a standalone header? Basically I like having the assurance that it'll be kept working for a long time. I'm willing to do that and to put future work into this, but having it in boost ensures that that continues to happen even after I can no longer provide support.

-- Malte Skarupke malteskarupke@fastmail.fm

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Mathias Gaunard

29 Jan 29 Jan

6:54 a.m.

On Thu, 23 Jan 2020 at 18:34, Malte Skarupke via Boost <boost@lists.boost.org> wrote:

...

...
If I understood correctly, your new algorithm increases the requirement for Boost.Sort to C++17 from C++11. I don't see a good reason for doing that. You can replace if constexpr and the generic lambdas in your implementation with other code constructs that work on C++11.

It only requires C++17 if you want to use ska_sort. The other algorithms remain at C++11. I agree that this is a little unfortunate, but when I briefly tried to get rid of "if constexpr" in the code, it turned out to be a giant pain. It would also probably make the code compile much more slowly. If you are trying to sort nested types, like vector<vector<pair<string, int>>>, compile time can explode without some of the short circuiting that I'm doing using if constexpr. (I can explain why it explodes if you're interested, but for now I'll just say that it works different from comparison based sorting and you have to instantiate new code at every level of depth) I know I can do the same short circuiting without if constexpr if I put in enough work, but I think the code quality would suffer a lot, making the code much harder to read and maintain. (not that it's simple as is, but it would go from "hard to read" to "impossible to read")

My experience with looking at C++17 codebases (which are still quite limited) suggests that generalized use of "if constexpr" is a code smell that suggests bad design, making code difficult to maintain and extend and also causing unnecessarily long compilation times. Designing generic components and combining them in a more traditional fashion leads to more sharing of template instantiations as special-casing happen at the declaration boundary rather than in the middle of a definition. If constexpr also encourages placing local specializations at random points in the generic code instead of looking at the big picture, rationalizing requirements and deferring any special action to a policy that models a more or less refined concept, which definition is in a single place within the code. Local changes will eventually pile up, and since they probably have some relation to each other but are not together in the code, they end up being fragile. I haven't looked at your code in particular, but that might be something to consider.

Malte Skarupke

30 Jan 30 Jan

2:14 a.m.

...

Francisco Tapia, fjtapia@gmail.com wrote: In addition to all this, I would like to express my discomfort at having to waste my time, which, as I said before, is very scarce, in repeating a test, which does not contribute anything, because it has not followed any of our recommendations.

I'm sorry as well that you had to waste your time, but I also don't know what I could have done here... I wasn't on the mailing list at the time and didn't see those recommendations until you just pointed them out to me. But a simple "hey, I think we already looked at this a year ago and here is a link to that discussion" would have been perfectly fine. Alternatively, if you are having a discussion about one of my algorithms, include me in the discussion. I would have loved to have known about the discussion when it first happened. So, having now finally heard your recommendations, here are my responses:

...

It looks like ska_sort is a radix sort designed mostly for the best-case, and that if you feed it the alrbreaker or binaryalrbreaker distributions it will perform poorly (though not as poorly as radix implementations with no fallback or that fallback to insertionsort).

I can try to optimize more for these cases. I did of course handle those cases without degrading performance too much (and when sorting containers I skip shared prefixes to make this less likely) but I didn't try at all to optimize this. I didn't think that these were common cases, so I figured "just make sure performance doesn't degrade too much" would be enough. If you happen to remember, do you know what spreadsort does for cases that are traditionally really bad for radix sort? Maybe I can use the same optimization.

...

It is not possible to pass a part of the library to C++17 and another part to remain with C++11. Passing it all to C++17 means leaving out many people and many companies, and that goes against the spirit of the library which is to reach as many users as possible.

Can you elaborate on that? Why is this not possible? Is there anything I can do to help? For example I could remove the ska_sort.hpp include from the common boost/sort/sort.hpp header. That way anyone who includes that header will continue to compile with C++11, and to get ska_sort you would have to directly include boost/sort/ska_sort/ska_sort.hpp.

...

2.- The algorithm has not implemented worst-case control, nor when all elements are the same or are ordered, as it was indicated in February.

It looks like spreadsort handles both of these by essentially doing std::is_sorted() on the input before doing any other work. I wouldn't want to do that in ska_sort because it slows down all the partially sorted cases to first have to do this check. But if you want, I can add an overload: template<typename It> void ska_sort_check_sorted(It begin, It end) { if (!std::is_sorted(begin, end)) ska_sort(begin, end); } With that overload it would be up to the user to decide if they want this check or not. They could either call this overload or ska_sort directly. But I still would prefer to not do that. My philosophy on this follows iammilind's response on this stack overflow question: https://stackoverflow.com/questions/6567326/does-stdsort-check-if-a-vector-i...

...

3.- The algorithm is incomplete. It is to be expected from the algorithm that it orders, not only integers but also signed and unsigned integers, 32 64 and 128-bit floating-point numbers and strings.

It already sorts all kinds of integers as well as float, double and long double, strings, tuples, vectors, variants, optionals, lists, maps and other types that I'm forgetting. I can add support for 128 bit floating point numbers (it's exactly the same logic as for long doubles) but it doesn't seem that widely used. Microsoft doesn't even support it in their compiler. Also spreadsort does not support it.

...

4.- The only argument of the algorithm is its speed. About this, I have several comments

a) When the implementation incorporates worst-case control of repeated and equal elements, the speed may change

See discussion about why I don't do this above. I expect it to slow down if I call std::is_sorted() first and if the user already knows that it's not sorted, they don't want that overhead.

...

b) The speed of an algorithm is a factor to be taken into account, but it is not the only one. For example, the Timsort algorithm is slower than any other algorithm with random data. But it is extremely fast when the data are almost ordered, a situation that occurs with much more frequency than we imagine. This algorithm is highly valued by programmers and companies, and even the Java language uses it for its sort function.

Ska_sort is very fast with the random data, but it is quite mediocre when the data are almost ordered.

According to your own benchmark results from a year ago, ska_sort does better on almost ordered data than spreadsort and pdqsort. But I confess that it's not a case that I optimized for, mainly because I don't know a good way to make radix sorts behave well on almost sorted data. But if this is a requirement for all new sorting algorithms, I would probably look into wrapping ska_sort in vergesort: https://github.com/Morwenn/vergesort However if I were to do that, it almost feels like it should be a separate algorithm. Just like pdqsort and spinsort are separate algorithms. Between those two, if users want something that works fast on almost sorted data, they can use spinsort, otherwise they can use pdqsort. And if we follow that thinking, and we say that the version that works well on almost sorted data is a separate algorithm, then what's the benefit of doing that over just telling people to use spinsort on almost sorted data? (which is what the documentation currently does)

...

5) Having a hybrid algorithm like Spreadsort, having another hybrid algorithm is not very interesting, because with small data sets, the time differences, if any, will be small, and with large data sets, parallel algorithms are used, which provide much more speed than any single-stranded algorithm

I am also thinking about providing a parallel algorithm, but it's a large amount of work. These algorithms take me months to write. So before I do that, this submission of ska_sort is kind of my testing ground to see if this is something I'm interested in doing again. If this goes well, I may work on a parallel version in the future. If you're worried about having to maintain two different sorting algorithms, I think that we might be able to deprecate spreadsort and to make it just call ska_sort internally. Since the ska_sort interface can sort more than the spreadsort interface, I think it should be possible to keep all old users working. (with the one complication that ska_sort uses C++17 of course...) As for whether people care about single threaded performance: In my environment almost all sorts are single threaded. Most cores already have work allocated on them, so if you want to "go wide" for a piece of code, you need to know that there are gaps on other threads running in parallel. If you're not sure about that, just use a single threaded algorithm. Because if the other cores are busy, your parallel tasks get put at the end of the task queue and you've just increased your tail latency a lot. And the performance does really matter. We have used ska_sort to speed things up to run in 1.8 milliseconds instead of 3.3 milliseconds, and that was a big deal. In video games you only have 16ms to do all your work to get something on the screen, so a speedup by a whole millisecond is rare. But there is a bigger point for why we should have a new algorithm: I believe that in the future everyone will use radix sort. There is a unique opportunity here for boost to define the interface that those sorting algorithms will use in the future. The spreadsort interface is certainly not the right interface for a generic radix sort, so we should use this opportunity to come up with one. -- Malte Skarupke malteskarupke@fastmail.fm

Steven Ross

24 Nov 24 Nov

3:05 a.m.

On Wed, Jan 29, 2020 at 9:21 PM Malte Skarupke via Boost <boost@lists.boost.org> wrote:

...

...
Francisco Tapia, fjtapia@gmail.com wrote: In addition to all this, I would like to express my discomfort at having to waste my time, which, as I said before, is very scarce, in repeating a test, which does not contribute anything, because it has not followed any of our recommendations.

I'm sorry as well that you had to waste your time, but I also don't know what I could have done here... I wasn't on the mailing list at the time and didn't see those recommendations until you just pointed them out to me. But a simple "hey, I think we already looked at this a year ago and here is a link to that discussion" would have been perfectly fine.

Alternatively, if you are having a discussion about one of my algorithms, include me in the discussion. I would have loved to have known about the discussion when it first happened.

So, having now finally heard your recommendations, here are my responses:

...
It looks like ska_sort is a radix sort designed mostly for the best-case, and that if you feed it the alrbreaker or binaryalrbreaker distributions it will perform poorly (though not as poorly as radix implementations with no fallback or that fallback to insertionsort).

I can try to optimize more for these cases. I did of course handle those cases without degrading performance too much (and when sorting containers I skip shared prefixes to make this less likely) but I didn't try at all to optimize this. I didn't think that these were common cases, so I figured "just make sure performance doesn't degrade too much" would be enough. If you happen to remember, do you know what spreadsort does for cases that are traditionally really bad for radix sort? Maybe I can use the same optimization.

Sorry, this whole email chain was filtered by gmail for some reason, until I made a related search. What you're looking for is the get_min_count function in spreadsort, the main part that makes it unique. The source is in here: https://github.com/boostorg/sort/blob/develop/include/boost/sort/spreadsort/... If a bucket has less than this size, it will use comparison-based sorting (or skip it if size < 2), using std::sort or pdqsort, which are very fast with N < 1000, and provide O(Nlog(N)) worst-case performance. If it has more elements than this size, it will recurse again with radix sorting. There is a special case if it can complete with 1 final radix iteration. The logic is explained here: https://www.boost.org/doc/libs/1_62_0/libs/sort/doc/html/index.html#sort.ove... It is really easy to beat spreadsort's performance on evenly distributed random data if you simplify this step, but that sets your code up for doing much worse than comparison-based sorting on certain distributions. In fact, you can do this with spreadsort by tweaking its constants: https://github.com/boostorg/sort/blob/develop/include/boost/sort/spreadsort/... Some particularly nasty distributions you might want to try: https://github.com/boostorg/sort/blob/develop/example/alrbreaker.cpp https://github.com/boostorg/sort/blob/develop/example/binaryalrbreaker.cpp I'd be happy to take enhancements to spreadsort itself if you can demonstrate they improve the overall performance.

...

...
It is not possible to pass a part of the library to C++17 and another part to remain with C++11. Passing it all to C++17 means leaving out many people and many companies, and that goes against the spirit of the library which is to reach as many users as possible.

Can you elaborate on that? Why is this not possible? Is there anything I can do to help? For example I could remove the ska_sort.hpp include from the common boost/sort/sort.hpp header. That way anyone who includes that header will continue to compile with C++11, and to get ska_sort you would have to directly include boost/sort/ska_sort/ska_sort.hpp.

...
2.- The algorithm has not implemented worst-case control, nor when all elements are the same or are ordered, as it was indicated in February.

It looks like spreadsort handles both of these by essentially doing std::is_sorted() on the input before doing any other work. I wouldn't want to do that in ska_sort because it slows down all the partially sorted cases to first have to do this check. But if you want, I can add an overload:

template<typename It> void ska_sort_check_sorted(It begin, It end) { if (!std::is_sorted(begin, end)) ska_sort(begin, end); }

With that overload it would be up to the user to decide if they want this check or not. They could either call this overload or ska_sort directly. But I still would prefer to not do that. My philosophy on this follows iammilind's response on this stack overflow question: https://stackoverflow.com/questions/6567326/does-stdsort-check-if-a-vector-i...

It also uses that loop to find the min and max; which isn't much faster if you skip the sorting check. Finding the min and max help substantially to speed up radix sorting on distributions that aren't random, and make it much harder to construct the worst-case distribution (I have code to do generate it, but it's a pretty weird distribution). Also, sorted (or partially sorted) input to a sort is a common case I've seen repeatedly in industry.

...

...
4.- The only argument of the algorithm is its speed. About this, I have several comments

a) When the implementation incorporates worst-case control of repeated and equal elements, the speed may change

See discussion about why I don't do this above. I expect it to slow down if I call std::is_sorted() first and if the user already knows that it's not sorted, they don't want that overhead.

...
b) The speed of an algorithm is a factor to be taken into account, but it is not the only one. For example, the Timsort algorithm is slower than any other algorithm with random data. But it is extremely fast when the data are almost ordered, a situation that occurs with much more frequency than we imagine. This algorithm is highly valued by programmers and companies, and even the Java language uses it for its sort function.

Ska_sort is very fast with the random data, but it is quite mediocre when the data are almost ordered.

According to your own benchmark results from a year ago, ska_sort does better on almost ordered data than spreadsort and pdqsort. But I confess that it's not a case that I optimized for, mainly because I don't know a good way to make radix sorts behave well on almost sorted data. But if this is a requirement for all new sorting algorithms, I would probably look into wrapping ska_sort in vergesort:

https://github.com/Morwenn/vergesort

However if I were to do that, it almost feels like it should be a separate algorithm. Just like pdqsort and spinsort are separate algorithms. Between those two, if users want something that works fast on almost sorted data, they can use spinsort, otherwise they can use pdqsort.

And if we follow that thinking, and we say that the version that works well on almost sorted data is a separate algorithm, then what's the benefit of doing that over just telling people to use spinsort on almost sorted data? (which is what the documentation currently does)

...
5) Having a hybrid algorithm like Spreadsort, having another hybrid algorithm is not very interesting, because with small data sets, the time differences, if any, will be small, and with large data sets, parallel algorithms are used, which provide much more speed than any single-stranded algorithm

I am also thinking about providing a parallel algorithm, but it's a large amount of work. These algorithms take me months to write. So before I do that, this submission of ska_sort is kind of my testing ground to see if this is something I'm interested in doing again. If this goes well, I may work on a parallel version in the future.

If you're worried about having to maintain two different sorting algorithms, I think that we might be able to deprecate spreadsort and to make it just call ska_sort internally. Since the ska_sort interface can sort more than the spreadsort interface, I think it should be possible to keep all old users working. (with the one complication that ska_sort uses C++17 of course...)

C++17 is a serious issue; we try to be compatible with older versions of C++. Are you saying ska_sort can sort more without using functors (which seems possible though I'm curious if this is possible without C++17), or that it can sort more period, which seems doubtful as stringsort can sort anything with a strict weak ordering with functors.

...

As for whether people care about single threaded performance: In my environment almost all sorts are single threaded. Most cores already have work allocated on them, so if you want to "go wide" for a piece of code, you need to know that there are gaps on other threads running in parallel. If you're not sure about that, just use a single threaded algorithm. Because if the other cores are busy, your parallel tasks get put at the end of the task queue and you've just increased your tail latency a lot. And the performance does really matter. We have used ska_sort to speed things up to run in 1.8 milliseconds instead of 3.3 milliseconds, and that was a big deal. In video games you only have 16ms to do all your work to get something on the screen, so a speedup by a whole millisecond is rare.

But there is a bigger point for why we should have a new algorithm: I believe that in the future everyone will use radix sort. There is a unique opportunity here for boost to define the interface that those sorting algorithms will use in the future. The spreadsort interface is certainly not the right interface for a generic radix sort, so we should use this opportunity to come up with one.

-- Malte Skarupke malteskarupke@fastmail.fm

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

1315

Age (days ago)

1987

Last active (days ago)

List overview

Download

10 comments

8 participants

participants (8)

degski
Francisco José Tapia
Glen Fernandes
Hans Dembinski
Malte Skarupke
Mathias Gaunard
Mike
Steven Ross