[review] [sort] Sort library review manager results
This is the results from the recent review of the Sort library of Steven Ross. First I would like to thank all those who made comments during the review, whether or not they officially gave a final Yes or No vote to whether the Sort library should be accepted as a Boost library. This list includes: Niall Douglas, Julian Gonggrijp, Phil Endecott, Vladimir Prus, Mathias Gaunard, Jeremy Murphy, Peter Dimov, Robert Ramey, Adam Walling, Anthony Polukhin, Phil Endecott, Paul Bristow, Thijs (M.A.) van den Berg Dupuis Etienne, and Frank Gennari If I have missed anyone I do apologize. Secondly I would like to thank Steven Ross for patiently answering all of the review comments to the best of his ability. My tally of Yes and No votes for acceptance are: Yes votes (5) : Niall Douglas ( conditional ), Julian Gonggrijp ( conditional ), Frank Gennari, Phil Endecott, Paul Bristow. No votes (3) : Vladimir Prus, Adam Walling ( conditional ), Anthony Polukhin I believe the condtional Yes vote from Julian Gonggrijp was completely met in the discussions of issues with Steven Ross and the conditional Yes vote from Niall Douglass was almost completely met in the discussion with Steven Ross about the issues mentioned, and enough to maintain his Yes vote. I want to also mention that the conditional No vote by Adam Walling has an implied Yes vote to it if the Sort library were one among other sort implementations. Since my final decision is not entirely based on the Yes and No votes I want to adumbrate some of the major issues brought up by the review without necessarily focussing on every one of the people who brought them up initially, as well as my own reactions to them as Review Manager. 1) The first major issue was whether a library whose basic merit lies in its algorithm and its speed/space constraints needs better theoretical backing. A number of reviewers discussed this after it was brought up. I tend to agree with reviewers that while the best theoretical basis is always desirable, it is not necessary for a Boost library whose empirical evidence can be and has been measured by its implementor and can be measured by any user. Furthermore Steven Ross has provided an extensive discussion in his documentation, as well as his original paper, about the theoretical merits of his technique. While much of this discussion is probably beyond the understanding of any but sorting experts and afficionados enough of it adequately explains the basic ideas behind SpreadSort for those who understand and have knowledge of basic sorting ideas and popular sorting techniques. 2) An important issue is the need for the Sorting library to provide extensive timing charts/tables comparing SpreadSort to at least std::sort ( and possibly other popular mainstream sorts ) with both different numbers of sort keys and different initial unsorted distributions. Steven Ross has been providing these charts/tables in the 'develop' branch of the library. 3) Connected to the previous issue is that a number of reviewers asked for better documentation on the conditions in which SpreadSort will show a marked improvement in speed compared to std::sort, as well as any condition(s) where SpreadSort performs worse than std::sort. While Steven Ross has explained that generally SpreadSort will fall back to using std::sort for some cases ( low N ), I too feel that the documentation needs to be more extensive for the average user in attempting to delineate the conditions where SpreadSort works best. 4) A really important issue discussed by a number of reviewers involves the creation of a synthetic radix, where most users of sorting algorithm are used to comparison based sorting functors. While the documentation provides information on the various individual sorts, as well as examples, I also feel that a discussion in the documentation of how to use SpreadSort when sorting structs/classes with multiple keys of different types would be invaluable. In other words, what and if are the common techniques for using SpreadSort with complicated keys which, when using comparison-based sorting, can be represented fairly easily be a functor. 5) Niall Douglass brought up the issue of exception guarantees when using SpreadSort, and I concur. The documentation should be stronger specifying what happens during the sorting algorithms if an exception occurs and/or does the sorting algorithm itself ever throw any exceptions and, if so, under what conditions. Steven Ross mentioned that he is adding exception guarantee information to his docs. 6) Many reviewers did not like the idea that the library should be called Sort when it is basically implementing a single sorting algorithm called SpreadSort. Alternate suggestions are: a) The library should be called SpreadSort. b) The library can be called Sort only if other sorting algorithms are added to it. c) The library can be called Sort with the proviso that it serves as a general library for sorting algorithms potentially added to it in the future. d) The library should be part of the Boost Algorithm library. Steven Ross mentioned other potential sorting implementations as part of the Sort library, under that general name, but all providing a good deal of extra work. These included a copy-based stable sorting algorithm, an LSD radix sorting, and an OpenMP based parallel implementation. Other reviewers mentioned some of the same types of sorting implementations and a few reviewers remarked that they also have created sorting implementations of their own. While I will give my own view at the end, I would like to mention here that I strongly believe that each sorting implementation needs to be reviewed separately for acceptance into Boost and never automatically added to a general library, whether it be called Sort or Algorithm or whatever. 7) Random issues brought up in comments I will mention, which have importance but which I do not believe are absolutely intrinsic yet still might be considered. a) C++11 techniques in general and inline namespaces in particular. b) A version that can work with less than random access. c) Optimizations for small sets of data ( Steven Ross has already addressed this to an extent in the 'developer' branch of the library ). d) Possible documentation of using SpreadSort as a subsort for parallel sorting techniques and possible discusion in the docs about Spreadort and parallel sorts. e) General O() information in docs. f) SpreadSort examples hoisted directly into docs using Quickbook technique and updating the Spreadort docs to use Quickbook. Review Result: Based on the voting for acceptance or not, based on the discussions during the review period, and based on Steven Ross's responses, I have decided that Steven Ross's library should be accepted into Boost. My main reason for my decision, aside from the votes and comments and specific issues discussed, is twofold: 1) There is a general consensus that Steven Ross has an excellent understanding of sorting techniques and algorithms and will support and refine his library for the benefit of programmers. 2) The empirical tests show that SpreadSort offers an improvement over std::sort, significantly in the area of large data sets, and this could be a major benefit to programmers. I do not know if the review manager has a say in this but based on the remarks of the reviewers I would like to see the library as accepted be named 'SpreadSort' rather than just 'Sort'. I do think that Boost can have a library called 'Sort' but I agree with the general consensus that a 'Sort' library needs more than one type of sorting algorithm. I would like to see other people, who mentioned in the reviews/comments that they have their own sorting implementation, also submit their own implementations to Boost and, if this happens and they are accepted, I can see combining them with 'SpreadSort' into a general Boost sorting library called 'Sort' in the future.
On Wed, Nov 26, 2014 at 8:32 PM, Edward Diener
This is the results from the recent review of the Sort library of Steven Ross.
...
I do not know if the review manager has a say in this but based on the remarks of the reviewers I would like to see the library as accepted be named 'SpreadSort' rather than just 'Sort'. I do think that Boost can have a library called 'Sort' but I agree with the general consensus that a 'Sort' library needs more than one type of sorting algorithm. I would like to see other people, who mentioned in the reviews/comments that they have their own sorting implementation, also submit their own implementations to Boost and, if this happens and they are accepted, I can see combining them with 'SpreadSort' into a general Boost sorting library called 'Sort' in the future.
We already have an "algorithm" directory, so it might make sense to add a "sort" sub-directory and category with "spreadsort" as the first specific sort to be added. For adding sorts that are just implementations of classic and well-studied sort algorithms, we need a lighter-weight mechanism than a full formal review, IMO. Edward, thanks for managing the review and producing such a well-done report! --Beman
On 11/27/2014 6:07 AM, Beman Dawes wrote:
On Wed, Nov 26, 2014 at 8:32 PM, Edward Diener
wrote: This is the results from the recent review of the Sort library of Steven Ross.
...
I do not know if the review manager has a say in this but based on the remarks of the reviewers I would like to see the library as accepted be named 'SpreadSort' rather than just 'Sort'. I do think that Boost can have a library called 'Sort' but I agree with the general consensus that a 'Sort' library needs more than one type of sorting algorithm. I would like to see other people, who mentioned in the reviews/comments that they have their own sorting implementation, also submit their own implementations to Boost and, if this happens and they are accepted, I can see combining them with 'SpreadSort' into a general Boost sorting library called 'Sort' in the future.
We already have an "algorithm" directory, so it might make sense to add a "sort" sub-directory and category with "spreadsort" as the first specific sort to be added.
Do you mean a directory structure of: boost libs algorithm sort spreadsort ... possible other sorts ? Does this work with modular boost, submodules, and the Boost Build system ? Hasn't there been lots of discussions of the difficulty of having libraries other than directly under boost/libs ? If this is now workable and been resolved, both via Git submodules and Boost Build, I would agree with your suggestion but is there clear online documentation in the modular boost wiki for setting this up ? The reason I suggested that 'sort' just change to 'spreadsort' was to avoid the above problems, but maybe those problems are now completely resolved and I just have not kept up with the discussion. Also Steven Ross will want to integrate 'spreadsort' into the modular boost directory structure and since this is his first contribution to Boost we need to make it understandable to him how to do this.
For adding sorts that are just implementations of classic and well-studied sort algorithms, we need a lighter-weight mechanism than a full formal review, IMO.
I agree. I just wanted to make sure we don't add a sort algorithm without any type of review just because it sounds helpful, should work, and we already have a sort library.
Edward, thanks for managing the review and producing such a well-done report!
You are very welcome.
On Thu Nov 27 2014 at 12:01:29 PM Edward Diener
We already have an "algorithm" directory, so it might make sense to add a "sort" sub-directory and category with "spreadsort" as the first specific sort to be added.
Do you mean a directory structure of:
boost libs algorithm sort spreadsort ... possible other sorts
?
Does this work with modular boost, submodules, and the Boost Build system ? Hasn't there been lots of discussions of the difficulty of having libraries other than directly under boost/libs ? If this is now workable and been resolved, both via Git submodules and Boost Build, I would agree with your suggestion but is there clear online documentation in the modular boost wiki for setting this up ?
The reason I suggested that 'sort' just change to 'spreadsort' was to avoid the above problems, but maybe those problems are now completely resolved and I just have not kept up with the discussion. Also Steven Ross will want to integrate 'spreadsort' into the modular boost directory structure and since this is his first contribution to Boost we need to make it understandable to him how to do this.
I think we want to keep it a separate library from algorithms for compatibility with modular boost. boost/libs/spreadsort is fine with me. An alternative structure would be to have a boost/libs/sort/spreadsort structure, but in that case I'd be maintaining future sort contributions in addition to spreadsort to keep it one coherent library (which I'm willing to do). Are there any preferences?
On 11/28/2014 6:31 AM, Steven Ross wrote:
On Thu Nov 27 2014 at 12:01:29 PM Edward Diener
wrote: We already have an "algorithm" directory, so it might make sense to add a "sort" sub-directory and category with "spreadsort" as the first specific sort to be added.
Do you mean a directory structure of:
boost libs algorithm sort spreadsort ... possible other sorts
?
Does this work with modular boost, submodules, and the Boost Build system ? Hasn't there been lots of discussions of the difficulty of having libraries other than directly under boost/libs ? If this is now workable and been resolved, both via Git submodules and Boost Build, I would agree with your suggestion but is there clear online documentation in the modular boost wiki for setting this up ?
The reason I suggested that 'sort' just change to 'spreadsort' was to avoid the above problems, but maybe those problems are now completely resolved and I just have not kept up with the discussion. Also Steven Ross will want to integrate 'spreadsort' into the modular boost directory structure and since this is his first contribution to Boost we need to make it understandable to him how to do this.
I think we want to keep it a separate library from algorithms for compatibility with modular boost. boost/libs/spreadsort is fine with me. An alternative structure would be to have a boost/libs/sort/spreadsort structure, but in that case I'd be maintaining future sort contributions in addition to spreadsort to keep it one coherent library (which I'm willing to do). Are there any preferences?
Even if it were specified as boost/libs/sort/spreadsort I think that any additional sorts under boost/libs/sort would be viewed as separate libraries, so that the onus would not be on you to maintain them but rather on the individual contributor. But I could be wrong about this as I am still unsure of how "sublibraries" are expected to work within the Boost tree. The fact that you would be willing, with your sort expertise, to do so is great. Also be aware that for any given Boost library there can be more than one maintainer, with different maintainers doing vastly different work for any given library.
On November 28, 2014 6:31:33 AM EST, Steven Ross
On Thu Nov 27 2014 at 12:01:29 PM Edward Diener
wrote: Do you mean a directory structure of:
boost libs algorithm sort spreadsort ... possible other sorts
? [snip]
I think we want to keep it a separate library from algorithms for compatibility with modular boost. boost/libs/spreadsort is fine with me.
On what do you base that conclusion?
An alternative structure would be to have a boost/libs/sort/spreadsort structure, but in that case I'd be maintaining future sort contributions in addition to spreadsort to keep it one coherent library (which I'm willing to do).
Thanks for offering to do that. However, I fail to understand how you think multiple sub-libraries under boost/libs/sort will be more tenable than the same under boost/libs/algorithm/sort. ___ Rob (Sent from my portable computation engine)
I think we want to keep it a separate library from algorithms for compatibility with modular boost. boost/libs/spreadsort is fine with me.
On what do you base that conclusion?
Based on the fact that I know how to build the library and copy it into the boost tree in the modular boost fashion, but don't know how to do the same from inside the algorithms directory.
An alternative structure would be to have a boost/libs/sort/spreadsort structure, but in that case I'd be maintaining future sort contributions in addition to spreadsort to keep it one coherent library (which I'm willing to do).
Thanks for offering to do that. However, I fail to understand how you think multiple sub-libraries under boost/libs/sort will be more tenable than the same under boost/libs/algorithm/sort.
There is a lot of code under algorithm. Synchronizing a few sort algorithms into a release version seems simpler than synchronizing all of algorithm. Please correct me and explain how it's done if I'm wrong.
On November 28, 2014 9:32:22 PM EST, Steven Ross
I think we want to keep it a separate library from algorithms for compatibility with modular boost. boost/libs/spreadsort is fine with me.
On what do you base that conclusion?
Based on the fact that I know how to build the library and copy it into the boost tree in the modular boost fashion, but don't know how to do the same from inside the algorithms directory.
An alternative structure would be to have a boost/libs/sort/spreadsort structure, but in that case I'd be maintaining future sort contributions in addition to spreadsort to keep it one coherent library (which I'm willing to do).
Thanks for offering to do that. However, I fail to understand how you think multiple sub-libraries under boost/libs/sort will be more tenable than the same under boost/libs/algorithm/sort.
There is a lot of code under algorithm. Synchronizing a few sort algorithms into a release version seems simpler than synchronizing all of algorithm. Please correct me and explain how it's done if I'm wrong.
Marshall Clow is the maintainer of Boost.Algorithm, so the latter would actually be his job. :) ___ Rob (Sent from my portable computation engine)
On Thu, Nov 27, 2014 at 12:00 PM, Edward Diener
On 11/27/2014 6:07 AM, Beman Dawes wrote:
We already have an "algorithm" directory, so it might make sense to add a
"sort" sub-directory and category with "spreadsort" as the first specific sort to be added.
Do you mean a directory structure of:
boost libs algorithm sort spreadsort ... possible other sorts
?
Yes, exactly. So sort is at the same level as "string" in the current libs/algorithm hierarchy.
Does this work with modular boost, submodules, and the Boost Build system ?
Sure. See the current boost-root/libs/algorithm. For an example where the components live in separate sub-modules, see boost-root/libs/numeric/conversion. My intuition is that "sort" should be the sub-module rather than having a separate sub-module for each kind of sort, but I'd like to hear from others before that is cast in concrete.
Hasn't there been lots of discussions of the difficulty of having libraries other than directly under boost/libs ?
There were some teething problems, but haven't those all been resolved?
If this is now workable and been resolved, both via Git submodules and Boost Build, I would agree with your suggestion but is there clear online documentation in the modular boost wiki for setting this up ?
The release managers handle the .gitmodules and other top-level setup. AFAIK, the submodule itself is setup like any submodule.
The reason I suggested that 'sort' just change to 'spreadsort' was to avoid the above problems,
That seems best anyhow since there will certainly be additional kinds of sorts added in the future.
but maybe those problems are now completely resolved and I just have not kept up with the discussion. Also Steven Ross will want to integrate 'spreadsort' into the modular boost directory structure and since this is his first contribution to Boost we need to make it understandable to him how to do this.
Yes. One question is "who is the maintainer for the overall sort library?" It would be nice if someone was looking at the big picture, so that there was some consistency between individual sorts for docs, testing, etc. There might be some overall docs, too, to help users choose between algorithms without having to read about each sort algorithm separately. Steven, are you interested in that role?
For adding sorts that are just implementations of classic and well-studied sort algorithms, we need a lighter-weight mechanism than a full formal review, IMO.
I agree. I just wanted to make sure we don't add a sort algorithm without any type of review just because it sounds helpful, should work, and we already have a sort library.
Right. The overall sort library maintainer could help manage issues like that. --Beman
On Sun, Nov 30, 2014 at 9:22 AM, Beman Dawes
On Thu, Nov 27, 2014 at 12:00 PM, Edward Diener
wrote: On 11/27/2014 6:07 AM, Beman Dawes wrote:
Hasn't there been lots of discussions of the difficulty of having libraries other than directly under boost/libs ?
There were some teething problems, but haven't those all been resolved?
There is one "problem" I ran into that as a result of that I suggested/requested that we put all submodules at the boost/libs level. I attempted to create a minimal subset of Boost to just build the documentation tools < https://github.com/grafikrobot/boost-doctools/tree/develop>. And it was hard to find and recreate the structure of sub-sub-modules as it wasn't obvious where they mapped from the build errors. And in some circumstances there where duplicate names at multiple levels which made it harder. -- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net -- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail
On 30 Nov 2014 at 10:00, Rene Rivera wrote:
Hasn't there been lots of discussions of the difficulty of having libraries other than directly under boost/libs ?
There were some teething problems, but haven't those all been resolved?
There is one "problem" I ran into that as a result of that I suggested/requested that we put all submodules at the boost/libs level. I attempted to create a minimal subset of Boost to just build the documentation tools < https://github.com/grafikrobot/boost-doctools/tree/develop>. And it was hard to find and recreate the structure of sub-sub-modules as it wasn't obvious where they mapped from the build errors. And in some circumstances there where duplicate names at multiple levels which made it harder.
There is also a problem in the future when Boost library A may depend on v1 of submodule X while Boost library B may depend on v2 of a submodule X. I don't think the solution BindLib requires which is that each Boost module mounts its dependency submodules inside itself (probably the include directory) is avoidable. At least with per Boost module submodules, they can pin themselves to some commit SHA. It would be nice though if Build could spot the same commit SHA link in a git submodule and simply symlink it rather than checking out another copy. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Hi, If you are interested about sort methods, I have a parallel implementation of intro-sort and merge-sort. The results of these implementations are similar to the obtained by the parallel implementations of GCC ( using OpnMP) and TBB ( don't have parallel stable sort). In the attached text file you can see the results on my computer. My implementation don't use OpenMP nor TBB. Use a concurrent stack implemented with a vector and with an spin-lock done with atomic variables. It will be compatible with any C++11 compiler The algorithms run well. But they are pending of a detailed adjust in order to improve the speed. If you are interested , please ,say me in which name space must be included. Now are a part of the new version of my library countertree and are in this name space. I will need two weeks for to do because I am teacher and in these days I am buried in a mountain of exams. Yours Francisco Tapia
Francisco,
On Sun Nov 30 2014 at 11:32:10 AM Francisco José Tapia
Hi,
If you are interested about sort methods, I have a parallel implementation of intro-sort and merge-sort.
The results of these implementations are similar to the obtained by the parallel implementations of GCC ( using OpnMP) and TBB ( don't have parallel stable sort). In the attached text file you can see the results on my computer.
Thanks for providing your results. Some questions about these results: 1) what random number generation function did you use? 2) Did you try the worst-case ordering for introsort? 3) Why would someone want to use your library instead of TBB? (having used OpenMP, I know why people might not like OpenMP) I'd like at least one statement from someone else on this user group that they'd like to use this library and why. 4) Can you do something to fix the performance for already-sorted data? TBB appears to have an optimization for that surprisingly common case. 5) Your stable_sort appears unacceptably slow single-threaded vs the standard gcc version. You probably should fix that or abandon it. 6) Have you compared memory usage to make sure all variants are comparable for the same number of threads?
My implementation don't use OpenMP nor TBB. Use a concurrent stack implemented with a vector and with an spin-lock done with atomic variables. It will be compatible with any C++11 compiler
Can you make it work as efficiently without C++11? Not everyone has a C++11 compiler just yet.
The algorithms run well. But they are pending of a detailed adjust in order to improve the speed.
If you are interested , please ,say me in which name space must be included. Now are a part of the new version of my library countertree and are in this name space. I will need two weeks for to do because I am teacher and in these days I am buried in a mountain of exams.
No rush. I'll let you know once we finalize on a namespace.
but maybe those problems are now completely resolved and I just have not kept up with the discussion. Also Steven Ross will want to integrate 'spreadsort' into the modular boost directory structure and since this is his first contribution to Boost we need to make it understandable to him how to do this.
Yes. One question is "who is the maintainer for the overall sort library?" It would be nice if someone was looking at the big picture, so that there was some consistency between individual sorts for docs, testing, etc. There might be some overall docs, too, to help users choose between algorithms without having to read about each sort algorithm separately.
Steven, are you interested in that role?
Yes, I'm willing to perform that role. The question seems to boil down to: do we have algorithm/sort/spreadsort, or just sort/spreadsort? Either way, sort should probably be its own submodule. There was still the concern from Rene that sub-sub-modules can be difficult for others to put together properly (for the purpose of building a minimal necessary subset). Has that been resolved? My inclination is that unless the maintainer of the algorithm library would welcome a new sub-submodule, and the concerns about the difficulty of building a minimal subset aren't resolved, I'd rather have this as it's own module: sort.
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Steven Ross Sent: 01 December 2014 11:18 To: boost@lists.boost.org Subject: Re: [boost] [review] [sort] Sort library review manager results
but maybe those problems are now completely resolved and I just have not kept up with the discussion. Also Steven Ross will want to integrate 'spreadsort' into the modular boost directory structure and since this is his first contribution to Boost we need to make it understandable to him how to do this.
Yes. One question is "who is the maintainer for the overall sort library?" It would be nice if someone was looking at the big picture, so that there was some consistency between individual sorts for docs, testing, etc. There might be some overall docs, too, to help users choose between algorithms without having to read about each sort algorithm
separately.
Steven, are you interested in that role?
Yes, I'm willing to perform that role.
The question seems to boil down to: do we have algorithm/sort/spreadsort, or just sort/spreadsort? Either way, sort should probably be its own submodule. There was still the concern from Rene that sub-sub-modules can be difficult for others to put together properly (for the purpose of building a minimal necessary subset). Has that been resolved?
My inclination is that unless the maintainer of the algorithm library would welcome a new sub-submodule, and the concerns about the difficulty of building a minimal subset aren't resolved, I'd rather have this as its own module: sort.
GIT is already causing some brain pain. So I think we should be rather cautious about nesting the modules any deeper than necessary. I'd favour starting at libs/sort , despite some logical sugar in adding it to Boost.Algorithms. Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830
Hi Steven,
My idea when developed the parallel sort methods was easy. I needed
parallel sort methods for my new version of my Counter Tree library. I
examined the GCC code and I thought, it could be possible to do without
Open MP ? And try it.
I am not against Open MP or TBB, but if you can do without them, remove an
obstacle for to be used by any C++11 compiler. I am not an expert in sort
methods, I used quick sort, heap sort and merge sort described in a basic
book of algorithms. My main interest are in the parallel methods.
I use a concurrent stack, done with a std::vector and spin locks done with
atomic variables. In the intro sort, the threads divide the problem and
insert the parts in the concurrent stack, and after extract the last and if
is smaller than a defined value, sort with the 1 thread sort, or split and
insert in the stack.
The stable sort is a merge sort, and is different. You have levels, in the
upper level you do 1 merge, in the next 2 …. The algorithm have atomic
counter for the levels, and when is smaller than a predefined size, sort
with the 1 thread algorithm.
The algorithms must be refined. The size limit for a thread is (
total_size / (nthreads *8) ). The number of 8 is arbitrary, run well, but I
need more benchmarks for to obtain the best value.
The 1 thread algorithms must be improved, specially the merge_sort, the
first solution can be copy the algorithm used in GCC, but I would like to
examine more in deep.
The detection about the sorted elements is easy in intro sort with a
counter of movements, when you detect zero movements form one side to
other, you can check if they are ordered. In the merge sort is more
difficult, but I will examine.
The worst case in intro sort is extremely difficult to obtain. I checked
with a counter when the algorithm use heap sort, and with hundred of
millions elements, with different data inputs, sometimes the counter is to
1, and the number of elements sorted is small.
About the size, intro sort don't use additional memory. Merge sort use an
array of a half of the size of the input data. This array is used, in the
parallel and in the 1 thread function, each call have the range of elements
and the related part of this array.
I use the atomics variables for to implement the spin locks. In a non
C++11 compiler, we can use pthreads, or any other thread library, and
mutex.
In my opinion , the most interesting part of my code are the parallel
functions. The other must be improved, and you know much more about it. I
only pretend to be useful with my code and my ideas. The goal is to provide
good sort methods to the community.
I have done other things about sorting which perhaps can be interesting. I
am testing indirect sort. When the size of the elements grows, the data bus
is stressed moving the elements and the performance of the algorithms
drops. If you create an array with pointers to the elements, and sort this
array, you have a penalty because you must access to the elements from a
pointer, but you only move pointers. At end with the pointers sorted, a
O(N) method sort the original elements. And with the size of the elements
grows, this method is faster than the original.
I hope have time this weekend for to pack and send the code. I don't know
if as zip file or create a git repository for this code. If I can help you,
please say me, and I will try. And if you want to use something say me for
to write the documentation, because now I have only the code, test programs
and several benchmarks.
Congratulations by your library.
Yours
Francisco Tapia
2014-12-01 13:57 GMT+01:00 Paul A. Bristow
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Steven Ross Sent: 01 December 2014 11:18 To: boost@lists.boost.org Subject: Re: [boost] [review] [sort] Sort library review manager results
but maybe those problems are now completely resolved and I just have not kept up with the discussion. Also Steven Ross will want to integrate 'spreadsort' into the modular boost directory structure and since this is his first contribution to Boost we need to make it understandable to him how to do this.
Yes. One question is "who is the maintainer for the overall sort
library?"
It would be nice if someone was looking at the big picture, so that there was some consistency between individual sorts for docs, testing, etc. There might be some overall docs, too, to help users choose between algorithms without having to read about each sort algorithm separately.
Steven, are you interested in that role?
Yes, I'm willing to perform that role.
The question seems to boil down to: do we have algorithm/sort/spreadsort, or just sort/spreadsort? Either way, sort should probably be its own submodule. There was still the concern from Rene that sub-sub-modules can be difficult for others to put together properly (for the purpose of building a minimal necessary subset). Has that been resolved?
My inclination is that unless the maintainer of the algorithm library would welcome a new sub-submodule, and the concerns about the difficulty of building a minimal subset aren't resolved, I'd rather have this as its own module: sort.
GIT is already causing some brain pain.
So I think we should be rather cautious about nesting the modules any deeper than necessary.
I'd favour starting at libs/sort , despite some logical sugar in adding it to Boost.Algorithms.
Paul
--- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Franciso
On Mon Dec 01 2014 at 1:52:00 PM Francisco José Tapia
The algorithms must be refined. The size limit for a thread is ( total_size / (nthreads *8) ). The number of 8 is arbitrary, run well, but I need more benchmarks for to obtain the best value.
That's reasonable, but the constant (8) should be somewhere easy to find and modify in case a different value is better on other systems.
The 1 thread algorithms must be improved, specially the merge_sort, the first solution can be copy the algorithm used in GCC, but I would like to examine more in deep.
Agreed. Don't be afraid to just fall back on the version in the standard library if you can't improve on it. I recommend doing that for the single-threaded part of the algorithm (unless you want to try my idea of replacing insertionsort with switch statement that selects between sorting networks, which will require a more careful performance review, but might be faster).
The detection about the sorted elements is easy in intro sort with a counter of movements, when you detect zero movements form one side to other, you can check if they are ordered. In the merge sort is more difficult, but I will examine.
That sounds reasonable.
The worst case in intro sort is extremely difficult to obtain. I checked with a counter when the algorithm use heap sort, and with hundred of millions elements, with different data inputs, sometimes the counter is to 1, and the number of elements sorted is small.
What partitioning algorithm are you using? Can't you design an input deliberately to mess up the partitioning, so only one or two items go into one of the two halves? Introsort is optimized to still perform decently in this worst case, and it'd be good to confirm that with worst-case data. About the size, intro sort don't use additional memory. Merge sort use an
array of a half of the size of the input data. This array is used, in the parallel and in the 1 thread function, each call have the range of elements and the related part of this array.
That sounds reasonable, though my understanding is that std::stable_sort is capable of running with less RAM than your merge_sort.
I have done other things about sorting which perhaps can be interesting. I am testing indirect sort. When the size of the elements grows, the data bus is stressed moving the elements and the performance of the algorithms drops. If you create an array with pointers to the elements, and sort this array, you have a penalty because you must access to the elements from a pointer, but you only move pointers. At end with the pointers sorted, a O(N) method sort the original elements. And with the size of the elements grows, this method is faster than the original.
Yes, indirect sorting is a well-known good technique. It's usually not difficult, but if you can create a general wrapper that makes it even easier to do efficiently, that would be interesting.
I hope have time this weekend for to pack and send the code. I don't know if as zip file or create a git repository for this code. If I can help you, please say me, and I will try. And if you want to use something say me for to write the documentation, because now I have only the code, test programs and several benchmarks.
If you can get the library fast enough that it's within 5% of the best competitor on both Linux and Windows for random, sorted, reverse-sorted, and mostly-sorted data (both integers and strings), and has reasonable worst-case performance, then I'll definitely be interested in taking a look.
On Mon, Dec 1, 2014 at 7:57 AM, Paul A. Bristow
GIT is already causing some brain pain.
So I think we should be rather cautious about nesting the modules any deeper than necessary.
I'd favour starting at libs/sort , despite some logical sugar in adding it to Boost.Algorithms.
Unless there are further objections, that's what I'll do.
On 27/11/14 02:32, Edward Diener wrote:
Review Result:
Based on the voting for acceptance or not, based on the discussions during the review period, and based on Steven Ross's responses, I have decided that Steven Ross's library should be accepted into Boost.
I did not really participate in the review, but I did notice there was consensus that the documentation was not really satisfactory. Was this issue already addressed during the review or will it be addressed during the integration period?
I made changes to the documentation in the develop branch during the review
to deal with concerns raised by reviewers. I will also improve the
specific documentation Edward suggested with regards to situations where
spreadsort vs std::sort are faster relative to each other, and provide
explanations and examples of how to use it for more complicated multiple
key sorts and such.
Specific suggestions on how to improve the documentation are welcome.
On Thu, Nov 27, 2014, 8:47 AM Mathias Gaunard
On 27/11/14 02:32, Edward Diener wrote:
Review Result:
Based on the voting for acceptance or not, based on the discussions during the review period, and based on Steven Ross's responses, I have decided that Steven Ross's library should be accepted into Boost.
I did not really participate in the review, but I did notice there was consensus that the documentation was not really satisfactory.
Was this issue already addressed during the review or will it be addressed during the integration period?
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost
On 11/27/2014 8:47 AM, Mathias Gaunard wrote:
On 27/11/14 02:32, Edward Diener wrote:
Review Result:
Based on the voting for acceptance or not, based on the discussions during the review period, and based on Steven Ross's responses, I have decided that Steven Ross's library should be accepted into Boost.
I did not really participate in the review, but I did notice there was consensus that the documentation was not really satisfactory.
Was this issue already addressed during the review or will it be addressed during the integration period?
There were a number of specific documentation suggestions during the review period and I think Steven Ross responded to them and is cognizant of them. I did mention some of them in the review results. But if you have any other documentation suggestions of your own I am sure he is listening.
Edward Diener wrote:
This is the results from the recent review of the Sort library of Steven Ross.
I have to say you gave us a tensive read. :-)
I believe the condtional Yes vote from Julian Gonggrijp was completely met in the discussions of issues with Steven Ross
To be honest I haven’t checked the documentation afterwards, but I trust any remaining issues will be resolved in a way that I can agree with.
Based on the voting for acceptance or not, based on the discussions during the review period, and based on Steven Ross's responses, I have decided that Steven Ross's library should be accepted into Boost.
Congratulations, Steven!
I do not know if the review manager has a say in this but based on the remarks of the reviewers I would like to see the library as accepted be named 'SpreadSort' rather than just 'Sort'. I do think that Boost can have a library called 'Sort' but I agree with the general consensus that a 'Sort' library needs more than one type of sorting algorithm. I would like to see other people, who mentioned in the reviews/comments that they have their own sorting implementation, also submit their own implementations to Boost and, if this happens and they are accepted, I can see combining them with 'SpreadSort' into a general Boost sorting library called 'Sort' in the future.
I think this is a very reasonable proposal. Regarding the possible future additions: I would absolutely love to submit my own sorting algorithm implementations. I first need to find time to remove a few rough edges before I would even dare to publish my code, but thank you for the encouragement. :-) -Julian
I do not know if the review manager has a say in this but based on the remarks of the reviewers I would like to see the library as accepted be named 'SpreadSort' rather than just 'Sort'. I do think that Boost can have a library called 'Sort' but I agree with the general consensus that a 'Sort' library needs more than one type of sorting algorithm. I would like to see other people, who mentioned in the reviews/comments that they have their own sorting implementation, also submit their own implementations to Boost and, if this happens and they are accepted, I can see combining them with 'SpreadSort' into a general Boost sorting library called 'Sort' in the future.
I think this is a very reasonable proposal.
Regarding the possible future additions: I would absolutely love to submit my own sorting algorithm implementations. I first need to find time to remove a few rough edges before I would even dare to publish my code, but thank you for the encouragement. :-)
One possible way to handle classic algorithms is to integrate them with
the tests I wrote for Spreadsort, and verify they show a significant benefit relative to the existing libraries for some subset of realistic cases and don't have any major bugs. Then a mini-review of the the API and documentation might be appropriate. Here are some potential standard algorithms that it might be nice to add in the future: LSD radix sort, for fixed-length data where stability matters Timsort, a stable comparison sort that is fast for mostly-sorted data and is used by standard Java implementations. A sorting network based comparison sort for small and fixed-size datasets. I've found these to be about twice as fast as Insertionsort. k-way merge parallel sort
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Steven Ross Sent: 28 November 2014 12:55 To: boost@lists.boost.org Subject: Re: [boost] [review] [sort] Sort library review manager results
One possible way to handle classic algorithms is to integrate them with the tests I wrote for Spreadsort, and verify they show a significant benefit relative to the existing libraries for some subset of realistic cases and don't have any major bugs. Then a mini-review of the the API and documentation might be appropriate.
Here are some potential standard algorithms that it might be nice to add in
+1 the
future:
LSD radix sort, for fixed-length data where stability matters Timsort, a stable comparison sort that is fast for mostly-sorted data and is used by standard Java implementations.
A sorting network based comparison sort for small and fixed-size datasets. I've found these to be about twice as fast as Insertionsort.
k-way merge parallel sort
Yes - go for it - these special-case sorts will be invaluable for some. Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830
On 28/11/14 13:55, Steven Ross wrote:
Here are some potential standard algorithms that it might be nice to add in the future: LSD radix sort, for fixed-length data where stability matters Timsort, a stable comparison sort that is fast for mostly-sorted data and is used by standard Java implementations. A sorting network based comparison sort for small and fixed-size datasets. I've found these to be about twice as fast as Insertionsort. k-way merge parallel sort
AFAIK, in the parallel world, merge sort and radix sort are the only two algorithms used. Without the special parallel optimizations though, they tend to underperform compared to smarter algorithms. There is room for different algorithms depending on the size of the data set, the architecture, and the type/size of the input itself.
On 28/11/14 13:55, Steven Ross wrote:
Here are some potential standard algorithms that it might be nice to add in the future: LSD radix sort, for fixed-length data where stability matters Timsort, a stable comparison sort that is fast for mostly-sorted data and is used by standard Java implementations. A sorting network based comparison sort for small and fixed-size datasets. I've found these to be about twice as fast as Insertionsort. k-way merge parallel sort
AFAIK, in the parallel world, merge sort and radix sort are the only two algorithms used. Without the special parallel optimizations though, they tend to underperform compared to smarter algorithms.
Both merge sorting and radix or bucket sorting are used in parallel sorting problems for dividing up and remerging the problem. k-way merge uses RAM and CPU efficiently by only requiring one final pass to merge however many pieces (which makes it strongly preferred when parallel sorting across multiple systems or on disk), where conventional mergesort is usually slightly faster for in-memory parallel sorting, at the expense of more CPU due to the extra passes. In-memory, quicksort-based splitting, or using an initial iteration of Spreadsort to break up the problem into
On Fri Nov 28 2014 at 12:45:37 PM Mathias Gaunard < mathias.gaunard@ens-lyon.org> wrote: pieces may be more efficient than plain mergesort. It's also worth noting that quite different algorithms are usually employed for the splitting up and remerging of the parallel problem as compared to the sorting of each individual chunk. I've seen plenty of parallel implementations that use std::sort for sorting the individual chunks assigned to a single thread, for example. I'm not about to jump into a boost parallel sort implementation. If somebody writes a cross-platform implementation they'd like to share that is speed-competitive and reasonably efficient, I'd be happy to take a look at it. What special parallel optimizations are you referring to? SIMD operations?
participants (10)
-
Beman Dawes
-
Edward Diener
-
Francisco José Tapia
-
Julian Gonggrijp
-
Mathias Gaunard
-
Niall Douglas
-
Paul A. Bristow
-
Rene Rivera
-
Rob Stewart
-
Steven Ross