Thank you for your feedback Julian. I'll improve the documentation.
On Sun Nov 16 2014 at 9:22:36 PM Julian Gonggrijp
- What is your evaluation of the design?
I’m not convinced that string_sort, float_sort and integer_sort need to be separate interfaces. I believe that the function to extract “digits” from the values for the MSD part of the algorithm can be captured in a fully generic way (that should be applicable to all radix-like sorting algorithms), coupled with two additional, optional parameters (pre- transform and post-transform) specifying functions for doing the temporary “treat floats as integers” trick.
string_sort is this fully generic approach. integer_sort and float_sort
can be seen as specializations that allow bit-level granularity of the radix operations and work well for keys that fit into a machine word. If you can think of some way to enable the sort-floats-as-ints trick as an option without harming performance for non-floats, I'd be very interested.
The rationale, which in my opinion is the most important part of the documentation and should be in the front, is too lengthy and misses its target. It does discuss the theoretical advantages of radix sort as well as the (technical) relative merits of MSD vs LSD and several hybrid sorting algorithms, to great length, but it does not address the question why somebody might want to use a hybrid radix/comparison sort in the first place. It seems to be a defense of design decisions rather than a justification of the very existence of the library.
The introduction severely and needlessly overstates the value of the algorithm, starting from the very first sentence:
“The Boost Sort library provides a generic implementation of high-speed sorting algorithms that outperform those in the C++ standard in both average and worst case performance.”
strongly implying that spreadsort is fit for replacing introsort (std::sort) in all possible situations, which it just isn’t.
Then the introduction mentions three conditions under which spreadsort is “fastest relative to std::sort”, but it should instead mention two conditions under which a user may want to consider using it at all:
1. Value type can be treated as a sequence of digits/characters.
2. Many keys to sort and/or very long keys.
I agree, if most of your sorting runtime is going to spent on lists of < 1000 elements, this library is of little use, though it should also do
Find me a type that you don't think this applies to, and I'll show you how it can. If std::sort can sort it, so can string_sort if you use the available functors properly. Maybe I should explain this in the documentation? I will agreed to the extent that if performance isn't very critical relative to the code complexity of creating an efficient radix key expansion equivalent to the comparison operation, then people probably are better off with std::sort. little harm due to the fallback to std::sort. I'll edit the docs in develop to make that more clear.
So to summarize this part of my review, I think that the documentation is insufficient as long as it doesn’t properly address the following points in the following order:
1. Why and under which conditions the library may be useful. 2. How to use the library. 3. What is going on inside the library (and why in that way).
The good news is that parts 2 and 3 have already been provided for and that part 1 should be very short. So I think relatively little work is needed to fix the documentation.
Will do.