[review] [sort] Sort library review manager results

27 Nov 2014

      This is the results from the recent review of the Sort library of Steven 
Ross.

First I would like to thank all those who made comments during the 
review, whether or not they officially gave a final Yes or No vote to 
whether the Sort library should be accepted as a Boost library. This 
list includes:

Niall Douglas, Julian Gonggrijp, Phil Endecott, Vladimir Prus, Mathias 
Gaunard, Jeremy Murphy, Peter Dimov, Robert Ramey, Adam Walling, Anthony 
Polukhin, Phil Endecott, Paul Bristow, Thijs (M.A.) van den Berg
Dupuis Etienne, and Frank Gennari

If I have missed anyone I do apologize.

Secondly I would like to thank Steven Ross for patiently answering all 
of the review comments to the best of his ability.

My tally of Yes and No votes for acceptance are:

Yes votes (5) :

Niall Douglas ( conditional ), Julian Gonggrijp ( conditional ),
Frank Gennari, Phil Endecott, Paul Bristow.

No votes (3) :

Vladimir Prus, Adam Walling ( conditional ), Anthony Polukhin

I believe the condtional Yes vote from Julian Gonggrijp was completely 
met in the discussions of issues with Steven Ross and the conditional 
Yes vote from Niall Douglas was almost completely met in the discussion 
with Steven Ross about the issues mentioned, and enough to maintain his 
Yes vote.

I want to also mention that the conditional No vote by Adam Walling has 
an implied Yes vote to it if the Sort library were one among other sort 
implementations.

Since my final decision is not entirely based on the Yes and No votes I 
want to adumbrate some of the major issues brought up by the review 
without necessarily focussing on every one of the people who brought 
them up initially, as well as my own reactions to them as Review Manager.

1) The first major issue was whether a library whose basic merit lies in 
its algorithm and its speed/space constraints needs better theoretical 
backing. A number of reviewers discussed this after it was brought up. I 
tend to agree with reviewers that while the best  theoretical basis is 
always desirable, it is not necessary for a Boost library whose 
empirical evidence can be and has been measured by its implementor and 
can be measured by any user. Furthermore Steven Ross has provided an 
extensive discussion in his documentation, as well as his original 
paper, about the theoretical merits of his technique. While much of this 
discussion is probably beyond the understanding of any but sorting 
experts and afficionados enough of it adequately explains the basic 
ideas behind SpreadSort for those who understand and have knowledge of 
basic sorting ideas and popular sorting techniques.

2) An important issue is the need for the Sorting library to provide 
extensive timing charts/tables comparing SpreadSort to at least 
std::sort ( and possibly other popular mainstream sorts ) with both 
different numbers of sort keys and different initial unsorted 
distributions. Steven Ross has been providing these charts/tables in the 
'develop' branch of the library.

3) Connected to the previous issue is that a number of reviewers asked 
for better documentation on the conditions in which SpreadSort will show 
a marked improvement in speed compared to std::sort, as well as any 
condition(s) where SpreadSort performs worse than std::sort. While 
Steven Ross has explained that generally SpreadSort will fall back to 
using std::sort for some cases ( low N ), I too feel that the 
documentation needs to be more extensive for the average user in 
attempting to delineate the conditions where SpreadSort works best.

4) A really important issue discussed by a number of reviewers involves 
the creation of a synthetic radix, where most users of sorting algorithm 
are used to comparison based sorting functors. While the documentation 
provides information on the various individual sorts, as well as 
examples, I also feel that a discussion in the documentation of how to 
use SpreadSort when sorting structs/classes with multiple keys of 
different types would be invaluable. In other words, what and if are the 
common techniques for using SpreadSort with complicated keys which, when 
using comparison-based sorting, can be represented fairly easily be a 
functor.

5) Niall Douglas brought up the issue of exception guarantees when using 
SpreadSort, and I concur. The documentation should be stronger 
specifying what happens during the sorting algorithms if an exception 
occurs and/or does the sorting algorithm itself ever throw any 
exceptions and, if so, under what conditions. Steven Ross mentioned that 
he is adding exception guarantee information to his docs.

6) Many reviewers did not like the idea that the library should be 
called Sort when it is basically implementing a single sorting algorithm 
called SpreadSort. Alternate suggestions are:

a) The library should be called SpreadSort.
b) The library can be called Sort only if other sorting algorithms are 
added to it.
c) The library can be called Sort with the proviso that it serves as a 
general library for sorting algorithms potentially added to it in the 
future.
d) The library should be part of the Boost Algorithm library.

Steven Ross mentioned other potential sorting implementations as part of 
the Sort library, under that general name, but all providing a good deal 
of extra work. These included a copy-based stable sorting algorithm, an 
LSD radix sorting, and an OpenMP based parallel implementation. Other 
reviewers mentioned some of the same types of sorting implementations 
and a few reviewers remarked that they also have created sorting 
implementations of their own. While I will give my own view at the end, 
I would like to mention here that I strongly believe that each sorting 
implementation needs to be reviewed separately for acceptance into Boost 
and never automatically added to a general library, whether it be called 
Sort or Algorithm or whatever.

7) Random issues brought up in comments I will mention, which have 
importance but which I do not believe are absolutely intrinsic yet still 
might be considered.

a) C++11 techniques in general and inline namespaces in particular.
b) A version that can work with less than random access.
c) Optimizations for small sets of data ( Steven Ross has already 
addressed this to an extent in the 'developer' branch of the library ).
d) Possible documentation of using SpreadSort as a subsort for parallel 
sorting techniques and possible discusion in the docs about SpreadSort 
and parallel sorts.
e) General O() information in docs.
f) SpreadSort examples hoisted directly into docs using Quickbook 
technique and updating the Spreadort docs to use Quickbook.

Review Result:

Based on the voting for acceptance or not, based on the discussions 
during the review period, and based on Steven Ross's responses, I have 
decided that Steven Ross's library should be accepted into Boost.

My main reason for my decision, aside from the votes and comments and 
specific issues discussed, is twofold:

1) There is a general consensus that Steven Ross has an excellent 
understanding of sorting techniques and algorithms and will support and 
refine his library for the benefit of programmers.
2) The empirical tests show that SpreadSort offers an improvement over 
std::sort, significantly in the area of large data sets, and this could 
be a major benefit to programmers.

I do not know if the review manager has a say in this but based on the 
remarks of the reviewers I would like to see the library as accepted be 
named 'SpreadSort' rather than just 'Sort'. I do think that Boost can 
have a library called 'Sort' but I agree with the general consensus that 
a 'Sort' library needs more than one type of sorting algorithm. I would 
like to see other people, who mentioned in the reviews/comments that 
they have their own sorting implementation, also submit their own 
implementations to Boost and, if this happens and they are accepted, I 
can see combining them with 'SpreadSort' into a general Boost sorting 
library called 'Sort' in the future.

Edward Diener

tags

participants (1)