Julian,
I updated the first paragraph in the introduction to make clearer what
this library is NOT good for, and added the change to the develop
branch. Here is the new first paragraph.
"The Boost Sort library provides a generic implementation of
high-speed sorting algorithms that outperform those in the C++
standard in both average and worst case performance when there are
over 1000 elements in the list to sort. They fall back to std::sort on
small data sets. These algorithms only work on random access
iterators. They are hybrids using both radix and comparison-based
sorting, specialized to sorting common data types, such as integers,
floats, and strings. These algorithms are encoded in a generic fashion
and accept functors, enabling them to sort any object that can be
processed like these basic data types. In the case of string_sort,
this includes anything with a defined strict weak ordering that
std::sort can sort, but writing efficient functors for some complex
key types may not be worth the additional effort relative to just
using std::sort, depending on how important speed is to your
application. Sample usages are available in the samples directory."
On Sun, Nov 16, 2014 at 9:22 PM, Julian Gonggrijp
Dear list, dear Edward, dear Steven,
First of all I would like to congratulate Steven for having his library finally reviewed. I have been aware of the library for years.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
When I first learned about the existence of the library (as I said, several years ago), I read the code and the documentation closely.
Today, I have quickly scanned through the previous discussion, re-read the introduction and the rationale from the documentation and made a quick glance over the code to check for any important changes.
- Are you knowledgeable about the problem domain?
Yes, quite. I have strong formal education in algorithm design and analysis. I also have a passion for sorting algorithms and I have been in the process of writing my own C++ sorting library, though I have not had the chance to work on the latter since early 2012. My own library mostly contains well-known comparison sorts but it also contains a few interesting innovations of my own. I have done a lot of benchmarking and analysis of my own algorithm implementations so I’m well aware of the issues that are relevant in comparing sorting algorithms.
Please note that I do not consider my own library a competitor for Steven Ross’, though. The aims and contents of the libraries seem to be sufficiently different that there can be a place for both. Apart from that, Steven Ross’ library is in a much more “production-ready” state than mine.
- What is your evaluation of the design?
I’m not convinced that string_sort, float_sort and integer_sort need to be separate interfaces. I believe that the function to extract “digits” from the values for the MSD part of the algorithm can be captured in a fully generic way (that should be applicable to all radix-like sorting algorithms), coupled with two additional, optional parameters (pre- transform and post-transform) specifying functions for doing the temporary “treat floats as integers” trick.
Apart from that, I think the design is sensible.
- What is your evaluation of the implementation?
Clever.
- What is your evaluation of the documentation?
Poor.
The rationale, which in my opinion is the most important part of the documentation and should be in the front, is too lengthy and misses its target. It does discuss the theoretical advantages of radix sort as well as the (technical) relative merits of MSD vs LSD and several hybrid sorting algorithms, to great length, but it does not address the question why somebody might want to use a hybrid radix/comparison sort in the first place. It seems to be a defense of design decisions rather than a justification of the very existence of the library.
The introduction severely and needlessly overstates the value of the algorithm, starting from the very first sentence:
“The Boost Sort library provides a generic implementation of high-speed sorting algorithms that outperform those in the C++ standard in both average and worst case performance.”
strongly implying that spreadsort is fit for replacing introsort (std::sort) in all possible situations, which it just isn’t.
Then the introduction mentions three conditions under which spreadsort is “fastest relative to std::sort”, but it should instead mention two conditions under which a user may want to consider using it at all:
1. Value type can be treated as a sequence of digits/characters. 2. Many keys to sort and/or very long keys.
This would be more honest about the scope of application, hence more helpful to the potential user. I believe this does not invalidate the library. Under the two conditions mentioned above, spreadsort can in fact offer significant performance improvements. I think the latter has been sufficiently proven.
The title of the library is also too pretentious, as other reviewers have already pointed out. I think it should be called “Spreadsort” rather than “Sort”.
So to summarize this part of my review, I think that the documentation is insufficient as long as it doesn’t properly address the following points in the following order:
1. Why and under which conditions the library may be useful. 2. How to use the library. 3. What is going on inside the library (and why in that way).
The good news is that parts 2 and 3 have already been provided for and that part 1 should be very short. So I think relatively little work is needed to fix the documentation.
- What is your evaluation of the potential usefulness of the library?
As I explained above, I think the library has somewhat narrow applications, but it can be highly valuable when applicable. This is true of most other sorting algorithms as well. Spreadsort is definitely useful enough for inclusion in Boost, especially as long as no generic implementations of other radix/comparison hybrids are available.
- Did you try to use the library? With what compiler? Did you have any problems?
Not this year. I may have tested the library when I first learned about its existence, but I don’t have any clear memory of the results.
And finally, every review should attempt to answer this question:
- Do you think the library should be accepted as a Boost library?
Yes, sort of. I think spreadsort should be added to Boost.Algorithm (please!), but first the documentation should be improved as I discussed. While I think that the interface should be more general, that is not a condition for acceptance as far as I’m concerned.
I do not agree with Niall Douglas that the library should add fallbacks for ranges that don’t offer random access. In general, I think that if you want to sort a range then you should ensure from the outset that it is a random access range, with the exception of very short ranges, for which quadratic complexity does not hurt, or linked lists, which can be efficiently sorted but only with intrusive sorting methods. Neither situation is relevant for this implementation of spreadsort.
Kind regards, Julian Gonggrijp
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost