Hi All, I wanted to know if someone is/was working on Radix Sort or not. I did check the SoC listings and did not see radix sort in the list. If no one is working, may someone help me provide some pointers/details as to what the expectation is with regards to Radix sort for boost ? Regards, -Sourav
I implemented a templated hybrid-radix sort that is faster than O(Nlog(N))
and is awaiting review. Are you looking for a pure radix sort, which tends
to be slower?
On Jul 11, 2013 4:13 AM, "Sourav"
Hi All, I wanted to know if someone is/was working on Radix Sort or not. I did check the SoC listings and did not see radix sort in the list.
If no one is working, may someone help me provide some pointers/details as to what the expectation is with regards to Radix sort for boost ?
Regards, -Sourav
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
I'm quite interested. I was going to do it myself, but I'm quite happy to
work with someone, help out, whatever.
Jeremy
On 11 July 2013 18:32, Steven Ross
I implemented a templated hybrid-radix sort that is faster than O(Nlog(N)) and is awaiting review. Are you looking for a pure radix sort, which tends to be slower? On Jul 11, 2013 4:13 AM, "Sourav"
wrote: Hi All, I wanted to know if someone is/was working on Radix Sort or not. I did check the SoC listings and did not see radix sort in the list.
If no one is working, may someone help me provide some pointers/details as to what the expectation is with regards to Radix sort for boost ?
Regards, -Sourav
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Here's the boost library ready for review: https://github.com/boost-vault/Miscellaneous/blob/master/algorithm_sorting.z... And here's the sourceforge project (same code, but setup for running outside of boost): http://sourceforge.net/projects/spreadsort/ It's been waiting for review for a long time, but I'm open to suggestions for improvement. On Thu, Jul 11, 2013 at 6:10 AM, Jeremy Murphy < jeremy.william.murphy@gmail.com> wrote:
I'm quite interested. I was going to do it myself, but I'm quite happy to work with someone, help out, whatever.
Jeremy
On 11 July 2013 18:32, Steven Ross
wrote: I implemented a templated hybrid-radix sort that is faster than O(Nlog(N)) and is awaiting review. Are you looking for a pure radix sort, which tends to be slower? On Jul 11, 2013 4:13 AM, "Sourav"
wrote: Hi All, I wanted to know if someone is/was working on Radix Sort or not. I did check the SoC listings and did not see radix sort in the list.
If no one is working, may someone help me provide some pointers/details as to what the expectation is with regards to Radix sort for boost ?
Regards, -Sourav
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Steven Ross Sent: Thursday, July 11, 2013 11:52 AM To: boost@lists.boost.org Subject: Re: [boost] radix sort
Here's the boost library ready for review: https://github.com/boost-vault/Miscellaneous/blob/master/algorithm_sorting.z...
And here's the sourceforge project (same code, but setup for running outside of boost): http://sourceforge.net/projects/spreadsort/
It's been waiting for review for a long time, but I'm open to suggestions for improvement.
Looks review-ready to me at a glance. Do you next need to find a someone to manage the review ? Paul PS I noted missing copyright claim on all examples. Running the inspect tool locally will reveal how you will later appear on the 'name'n'shame' list. http://www.boost.org/doc/libs/1_54_0/tools/inspect/ build the inspect.exe and "The program is run in the directory to be scanned for errors. Sub-directories are also included in the scan."
I'll clean up the issues you mentioned, and yes I would appreciate help
getting it reviewed.
On Jul 11, 2013 7:18 AM, "Paul A. Bristow"
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Steven Ross Sent: Thursday, July 11, 2013 11:52 AM To: boost@lists.boost.org Subject: Re: [boost] radix sort
Here's the boost library ready for review:
https://github.com/boost-vault/Miscellaneous/blob/master/algorithm_sorting.z...
And here's the sourceforge project (same code, but setup for running
outside of boost):
http://sourceforge.net/projects/spreadsort/
It's been waiting for review for a long time, but I'm open to suggestions for improvement.
Looks review-ready to me at a glance. Do you next need to find a someone to manage the review ?
Paul
PS I noted missing copyright claim on all examples.
Running the inspect tool locally will reveal how you will later appear on the 'name'n'shame' list.
http://www.boost.org/doc/libs/1_54_0/tools/inspect/
build the inspect.exe and
"The program is run in the directory to be scanned for errors. Sub-directories are also included in the scan."
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Paul A. Bristow wrote:
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Steven Ross Sent: Thursday, July 11, 2013 11:52 AM To: boost@lists.boost.org Subject: Re: [boost] radix sort
Here's the boost library ready for review: https://github.com/boost-vault/Miscellaneous/blob/master/algorithm_sorting.z...
And here's the sourceforge project (same code, but setup for running outside of boost): http://sourceforge.net/projects/spreadsort/
It's been waiting for review for a long time, but I'm open to suggestions for improvement.
Looks review-ready to me at a glance. Do you next need to find a someone to manage the review ?
Note that there is no reason that the review cannot start right now. I've been working on my www.blincubator.com just to address this situation. This would permit those who need the library right now to try it out and post their experience in anticipation of any review. That it - it is intended to "de-couple" the evaluation/information gathering aspect of the review from a specifice one/two week time frame. I want to continue and enhance the www.blincubator site - I just need more incentive to get it moving up the heap. Robert Ramey
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: Thursday, July 11, 2013 11:05 PM To: boost@lists.boost.org Subject: Re: [boost] radix sort
Paul A. Bristow wrote:
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Steven Ross Sent: Thursday, July 11, 2013 11:52 AM To: boost@lists.boost.org Subject: Re: [boost] radix sort
Here's the boost library ready for review: https://github.com/boost-vault/Miscellaneous/blob/master/algorithm_so rting.zip
And here's the sourceforge project (same code, but setup for running outside of boost): http://sourceforge.net/projects/spreadsort/
It's been waiting for review for a long time, but I'm open to suggestions for improvement.
Looks review-ready to me at a glance. Do you next need to find a someone to manage the review ?
Note that there is no reason that the review cannot start right now. I've been working on my www.blincubator.com just to address this situation. This would permit those who need the library right now to try it out and post their experience in anticipation of any review. That it - it is intended to "de-couple" the evaluation/information gathering aspect of the review from a specifice one/two week time frame.
I want to continue and enhance the www.blincubator site - I just need more incentive to get it moving up the heap.
I strongly agree with your suggestions, but worry that it is splintering from the Boost site. The GITerization of Boost will change so much. So I think we should get that under our belts before making any changes to the review process. Since authors will then be much more in charge of their destiny on *their* GIT repos, they can start to do the things you rightly say are needed to improve software and, especially, documentation, and the information available for reviews. Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com
Paul A. Bristow wrote:
-----Original Message-----
Looks review-ready to me at a glance. Do you next need to find a someone to manage the review ?
Note that there is no reason that the review cannot start right now. I've been working on my www.blincubator.com just to address this situation. This would permit those who need the library right now to try it out and post their experience in anticipation of any review. That it - it is intended to "de-couple" the evaluation/information gathering aspect of the review from a specifice one/two week time frame.
I want to continue and enhance the www.blincubator site - I just need more incentive to get it moving up the heap.
I strongly agree with your suggestions, but worry that it is splintering from the Boost site.
The GITerization of Boost will change so much.
So I think we should get that under our belts before making any changes to the review process. Since authors will then be much more in charge of their destiny on *their* GIT repos, they can start to do the things you rightly say are needed to improve software and, especially, documentation, and the information available for reviews.
The boost review is the heart and soul of boost. I've designed www.blincubator.com to complement and strengthen this rather than replace it or diminish it in anyway. The boost review can be improved by having more and better reviews and a more people willing to be review managers. Previous efforts to achieve this haven't really addressed how we are going to do just this. I'm attempting to address this by a) providing an enhanced sandbox where users can review and install submissions before they have been reviewed. b) permitting these pre-review users an opportunity to post their review when they are evaluating the submission in when they are trying to use it rather than waiting for the formal review period. c) facilitating feedback to submission authors before formal review. Note that usage of www.blincubator.com doesn't change anything in the current formal review process or how anything gets added to boost. Hopefully, this will make the formal review a much easier process and motivate more people take on this very important task. Robert Ramey P.S. I would be pleased to hear via private email from any Wordpress/PHP gurus who would like to help me out on this. RR
From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of
Steven Ross Sent: Thursday, July 11, 2013 11:52 AM To: boost@lists.boost.org Subject: Re: [boost] radix sort
Here's the boost library ready for review:
https://github.com/boost-vault/Miscellaneous/blob/master/algorithm_sorting.z...
And here's the sourceforge project (same code, but setup for running outside of boost): http://sourceforge.net/projects/spreadsort/
Steven, Thanks for the response. I am just trying to pick up a project from the algorithms/data structure sections listed in the SoC page. I do not know whether there is a demand for the regular radix sort. Regards, -Sourav
Just to be clear, Steven's code is not actually radix sort, it's bucket
sort, right? Which is great, but let's not get confused about what's what.
Steven, I gave it a quick test but it failed when trying to include
From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of
Steven Ross Sent: Thursday, July 11, 2013 11:52 AM To: boost@lists.boost.org Subject: Re: [boost] radix sort
Here's the boost library ready for review:
https://github.com/boost-vault/Miscellaneous/blob/master/algorithm_sorting.z...
And here's the sourceforge project (same code, but setup for running outside of boost): http://sourceforge.net/projects/spreadsort/
Steven, Thanks for the response. I am just trying to pick up a project from the algorithms/data structure sections listed in the SoC page. I do not know whether there is a demand for the regular radix sort.
Regards, -Sourav
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
21.07.2013 16:29, Jeremy Murphy:
Out of curiosity, I decided to tackle the problem and wrote an implementation of counting sort, which as you may know, is one way of implementing radix sort. The code is not ready for a formal review but I would really appreciate feedback on it, as it is written with a view to inclusion in Boost. https://github.com/jeremy-murphy/integer-sort
Some thoughts after very quick look: 1. raising 2 to unsigned power via std::pow. just use (1 << i) 2. include directives of <cstring> and <climits> within namespace boost 3. There is "#ifdef __GXX_EXPERIMENTAL_CXX0X__" - one case uses lambda, second uses local class as functor. First of all - there is no need to do both versions - just make one which is gcd of all target compilers. Second, afaik - C++03 does not allow to use local classes as template arguments. 4. Code in main.cpp looks messier, try to partition it into logical blocks. 5. There is "assert(__first != __last);" at begin of algorithm - while common approach (for instance at STL) is to allow empty ranges. You may just do something like "if(__first == __last) return;"
I'm particularly still uncertain about the interface (asking for forward AND reverse iterators seems annoying, however legitimate). Thanks, cheers.
I think it is better just to take BidirectionalIterator - you always can use std::reverse_iterator in order to reverse it (however, while it simplifies code, it may incur additional overhead at dereferencing, so for widely used algorithm it is better to avoid it). -- Evgeny Panasyuk
On 22 July 2013 00:28, Evgeny Panasyuk
21.07.2013 16:29, Jeremy Murphy:
Out of curiosity, I decided to tackle the problem and wrote an
implementation of counting sort, which as you may know, is one way of implementing radix sort. The code is not ready for a formal review but I would really appreciate feedback on it, as it is written with a view to inclusion in Boost. https://github.com/jeremy-**murphy/integer-sorthttps://github.com/jeremy-murphy/integer-sort
Some thoughts after very quick look:
1. raising 2 to unsigned power via std::pow. just use (1 << i)
2. include directives of <cstring> and <climits> within namespace boost
3. There is "#ifdef __GXX_EXPERIMENTAL_CXX0X__" - one case uses lambda, second uses local class as functor. First of all - there is no need to do both versions - just make one which is gcd of all target compilers. Second, afaik - C++03 does not allow to use local classes as template arguments.
4. Code in main.cpp looks messier, try to partition it into logical blocks.
5. There is "assert(__first != __last);" at begin of algorithm - while common approach (for instance at STL) is to allow empty ranges. You may just do something like "if(__first == __last) return;"
I'm particularly still uncertain about the interface (asking for forward
AND reverse iterators seems annoying, however legitimate). Thanks, cheers.
I think it is better just to take BidirectionalIterator - you always can use std::reverse_iterator in order to reverse it (however, while it simplifies code, it may incur additional overhead at dereferencing, so for widely used algorithm it is better to avoid it).
-- Evgeny Panasyuk
______________________________**_________________ Unsubscribe & other changes: http://lists.boost.org/** mailman/listinfo.cgi/boosthttp://lists.boost.org/mailman/listinfo.cgi/boost
Thanks for catching my stupid mistakes! I've made all the changes you recommended except that I'm still hedging my bets on C++11. I guess I'll ask Marshall Clow if there is a policy for inclusion into the Algorithm library. Cheers. Jeremy
Just to be clear, Steven's code is not actually radix sort, it's bucket
sort, right? Which is great, but let's not get confused about what's what.
It is MSD radix sort, which is recursive bucket sort, combined with switching to std::sort once the problem is cut up small enough that std::sort has better worst-case performance. Conventional MSD radix sorts use insertion sort for small problems instead, and switch at much smaller data sizes, which can be problematic for performance with data that is unevenly distributed (conventional MSD radix sorts take about 4X as long as std::sort in this case).
Steven, I gave it a quick test but it failed when trying to include
. Looks like Robert Ramey changed the names of things around 1.37.0. (I tested with 1.53.0.) Changing it to fixed it up. Looks like <vector>, <cstring> and <limits> are unnecessary includes in some of the files?
Thanks, I'll clean those up.
Otherwise, it worked, and was much faster than std::sort(), so well done. :)
I'm not real sure about the interface, though: do they have to be different named functions (integer_, float_, string_)? Can't it be called bucket_sort() and specialized for different types? And reverse sorting, isn't that done by passing reverse iterators?
Reverse sorting could be done by passing reverse iterators, though that eliminates the possibility of using raw pointers, which are slightly faster with some of the compilers I tested. I'd be happy to remove the reverse calls if there is general agreement that the performance difference isn't a
Thanks. problem. With regards to different types, this covers three different cases: sorting by an integer or integer-like key of finite length, sorting a float correctly by casting it to an integer (tricky but fast), and sorting strings of variable length. Specializing by type isn't ideal as people often want to sort larger data types by a key, and the key could itself be a more complex data type that isn't an integer, float, or string, but supports the necessary operators.
Out of curiosity, I decided to tackle the problem and wrote an implementation of counting sort, which as you may know, is one way of implementing radix sort. The code is not ready for a formal review but I would really appreciate feedback on it, as it is written with a view to inclusion in Boost.
An LSD radix sort could make sense as an additional option, and counting sort would make sense as a part of that. LSD is substantially different from MSD radix sort, so it seems like a separate project to me. LSD radix sort is often slower than std::sort on real problems because it can't take advantage of easy problems. It also uses more memory. and doesn't make sense on variable-length data types. All that said, it is useful for some types of problems, such as when stability is a requirement.
On 22 July 2013 00:58, Steven Ross
It is MSD radix sort, which is recursive bucket sort, combined with switching to std::sort once the problem is cut up small enough that std::sort has better worst-case performance. Conventional MSD radix sorts use insertion sort for small problems instead, and switch at much smaller data sizes, which can be problematic for performance with data that is unevenly distributed (conventional MSD radix sorts take about 4X as long as std::sort in this case).
Ah, I apologize for the misunderstanding and thank you for clearing it up.
Reverse sorting could be done by passing reverse iterators, though that
eliminates the possibility of using raw pointers, which are slightly faster
with some of the compilers I tested. I'd be happy to remove the reverse calls if there is general agreement that the performance difference isn't a problem.
If performance is the only reason can you provide or link to some empirical data? Presumably the same decision must have been broached with regard to std::sort() and there is no std::reverse_sort(). This discussion on SO suggests that with enough optimization all iterators are equal: http://stackoverflow.com/questions/9025084/sorting-a-vector-in-descending-or... It also points out that reverse sorting can be achieved with forward iterators by using the opposite sorting functor. An LSD radix sort could make sense as an additional option, and counting
sort would make sense as a part of that. LSD is substantially different
from MSD radix sort, so it seems like a separate project to me. LSD radix sort is often slower than std::sort on real problems because it can't take advantage of easy problems. It also uses more memory. and doesn't make sense on variable-length data types. All that said, it is useful for some types of problems, such as when stability is a requirement.
I certainly haven't done extensive testing, but so far the stable counting-sort has been much quicker than std::sort(). It has even been quicker than your code for small (< 1000) values of k (maximum value). :) The non-stable counting-sort (based on Rosetta Code) is faster still, so I'm curious to see some extensive performance tests. My LSD radix-sort is almost finished, I'll let you know when. Cheers. Jeremy
On Mon, Jul 22, 2013 at 1:05 AM, Jeremy Murphy < jeremy.william.murphy@gmail.com> wrote:
Reverse sorting could be done by passing reverse iterators, though that
eliminates the possibility of using raw pointers, which are slightly faster
with some of the compilers I tested. I'd be happy to remove the reverse calls if there is general agreement that the performance difference isn't a problem.
If performance is the only reason can you provide or link to some empirical data? Presumably the same decision must have been broached with regard to std::sort() and there is no std::reverse_sort(). This discussion on SO suggests that with enough optimization all iterators are equal:
http://stackoverflow.com/questions/9025084/sorting-a-vector-in-descending-or... It also points out that reverse sorting can be achieved with forward iterators by using the opposite sorting functor.
I just ran my sample.cpp, and std::sort runtime drops from 169.67 to 164.01 seconds using pointers instead of iterators on the vector. In general, I've seen about a 1% speedup in sorting when using pointers instead of iterators. It isn't a big deal, but I don't think there is a reverse pointer. What you pointed me to was somebody forgetting to turn compiler optimizations on and seeing huge differences. Iterators are just a bit more complex than the built-in pointer support.
I certainly haven't done extensive testing, but so far the stable counting-sort has been much quicker than std::sort(). It has even been quicker than your code for small (< 1000) values of k (maximum value). :) The non-stable counting-sort (based on Rosetta Code) is faster still, so I'm curious to see some extensive performance tests. My LSD radix-sort is almost finished, I'll let you know when.
My code uses std::sort when there are <1000 values, because it doesn't provide a consistent speedup on randomized data sets smaller than that. What type of data are you sorting, and how is it distributed?
On Jul 21, 2013, at 10:58 AM, Steven Ross
I'm not real sure about the interface, though: do they have to be different named functions (integer_, float_, string_)? Can't it be called bucket_sort() and specialized for different types?
[snip]
With regards to different types, this covers three different cases: sorting by an integer or integer-like key of finite length, sorting a float correctly by casting it to an integer (tricky but fast), and sorting strings of variable length. Specializing by type isn't ideal as people often want to sort larger data types by a key, and the key could itself be a more complex data type that isn't an integer, float, or string, but supports the necessary operators.
Use a function template, don't implement the primary specialization. Just implement the specializations you support. If you dispatch to a static member function of a class template, PTS with type traits can give wider support without specializing for otherwise similar types. (all integral types, for example). ___ Rob (Sent from my portable computation engine)
Use a function template, don't implement the primary specialization. Just implement the specializations you support. If you dispatch to a static member function of a class template, PTS with type traits can give wider support without specializing for otherwise similar types. (all integral types, for example).
Won't the user have to give their class a trait to use the function, if it isn't a raw integer type? The other concern is that the integer and string algorithms are different in terms of worst-case performance, as the string algorithm has to deal with variable-length inputs. Currently there is a spread_sort wrapper that selects integers or floats using PTS, and failing that, defaults to std::sort. I've yet to see a way to identify that a data type is string-like. I can remove the fallback if that's the desired behavior (to avoid confusing the user about which algorithm is in use), but that doesn't resolve the string issue.
On Jul 22, 2013, at 6:33 AM, Steven Ross
Use a function template, don't implement the primary specialization. Just implement the specializations you support. If you dispatch to a static member function of a class template, PTS with type traits can give wider support without specializing for otherwise similar types. (all integral types, for example).
Won't the user have to give their class a trait to use the function, if it isn't a raw integer type? The other concern is that the integer and string algorithms are different in terms of worst-case performance, as the string algorithm has to deal with variable-length inputs.
You're over-thinking this. I wasn't referring to UDTs. foo_int and foo<int> are different ways to spell a function targeting int. The latter happens to allow for use of the same identifier as the name, for different types, though it must be specialized. Specializing is most convenient when argument type deduction does the work, but then overloading may do the same for non-templated functions, depending on the arguments.
Currently there is a spread_sort wrapper that selects integers or floats using PTS, and failing that, defaults to std::sort. I've yet to see a way to identify that a data type is string-like. I can remove the fallback if that's the desired behavior (to avoid confusing the user about which algorithm is in use), but that doesn't resolve the string issue.
If you want to offer string behavior for UDTs, you can combine PTS, for known string types, with a fallback to a trait that indicates/bridges string behavior. ___ Rob (Sent from my portable computation engine)
On Tue, Jul 23, 2013 at 5:05 AM, Rob Stewart
You're over-thinking this. I wasn't referring to UDTs. foo_int and foo<int> are different ways to spell a function targeting int. The latter happens to allow for use of the same identifier as the name, for different types, though it must be specialized.
Specializing is most convenient when argument type deduction does the work, but then overloading may do the same for non-templated functions, depending on the arguments.
I never thought of that before (using foo<int> to process something that isn't actually of int type, but acts like it, instead of foo_int as you mention). That's an interesting idea. I'm worried that it would scare off potential users, thinking that function only works on actual ints. My experience is that people usually care more about sort performance for UDTs than raw data types in operational code. If you want to offer string behavior for UDTs, you can combine PTS, for
known string types, with a fallback to a trait that indicates/bridges string behavior.
Does anybody have a suggestion as to what this string trait might be? Assuming there is such a trait, should I eliminate the fallback to std::sort for unidentified types?
OK, I've finished my draft implementation, although the test code (in
main.cpp) is still a bit of a mess.
In terms of performance, it performs poorly on 64-bit integers,
competitively on 32-bit integers, and quite impressively on 16-bit and
8-bit integers.
Unfortunately, I have to leave it alone for now as other things in life are
pressing. If someone else wants to take it and continue it, please be my
guest. I will come back and tidy things up eventually.
https://github.com/jeremy-murphy/integer-sort
Cheers!
Jeremy
On 23 July 2013 20:11, Steven Ross
On Tue, Jul 23, 2013 at 5:05 AM, Rob Stewart
wrote:
You're over-thinking this. I wasn't referring to UDTs. foo_int and foo<int> are different ways to spell a function targeting int. The latter happens to allow for use of the same identifier as the name, for
different
types, though it must be specialized.
Specializing is most convenient when argument type deduction does the work, but then overloading may do the same for non-templated functions, depending on the arguments.
I never thought of that before (using foo<int> to process something that isn't actually of int type, but acts like it, instead of foo_int as you mention). That's an interesting idea. I'm worried that it would scare off potential users, thinking that function only works on actual ints. My experience is that people usually care more about sort performance for UDTs than raw data types in operational code.
If you want to offer string behavior for UDTs, you can combine PTS, for
known string types, with a fallback to a trait that indicates/bridges string behavior.
Does anybody have a suggestion as to what this string trait might be? Assuming there is such a trait, should I eliminate the fallback to std::sort for unidentified types?
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Tue, Jul 23, 2013 at 11:19 AM, Jeremy Murphy < jeremy.william.murphy@gmail.com> wrote:
In terms of performance, it performs poorly on 64-bit integers, competitively on 32-bit integers, and quite impressively on 16-bit and 8-bit integers.
Yes, radix sorts work great when the data type is small (such as 8-bit or 16-bit integers). To get a better comparison against std::sort for the larger integers, I suggest you use data in a range roughly matching the size of the data type (what is RAND_MAX on your system? It's usually less than 1 << 16). As a note, MSD radix sort is extremely fast on the evenly-distributed random data that it looks like you're using. I test Spreadsort with a bunch of uneven distributions that make the problem harder, in addition to a few tests with evenly distributed random data. LSD radix sort doesn't care much about distribution, but most other algorithms (including std::sort) do.
On 25 July 2013 20:15, Steven Ross
On Tue, Jul 23, 2013 at 11:19 AM, Jeremy Murphy < jeremy.william.murphy@gmail.com> wrote:
Yes, radix sorts work great when the data type is small (such as 8-bit or 16-bit integers). To get a better comparison against std::sort for the larger integers, I suggest you use data in a range roughly matching the size of the data type (what is RAND_MAX on your system? It's usually less than 1 << 16).
Actually, RAND_MAX = INT_MAX = (1 << 31) - 1 on my system (glibc-2.15). So how do I generate random 64-bit numbers... (rand() << 32) + rand()? Bah, I should just be using Boost.Random! As a note, MSD radix sort is extremely fast on the evenly-distributed
random data that it looks like you're using. I test Spreadsort with a bunch of uneven distributions that make the problem harder, in addition to a few tests with evenly distributed random data. LSD radix sort doesn't care much about distribution, but most other algorithms (including std::sort) do.
Yes, I understand that's the assumption that bucket sort works on. So I guess that's also an argument for having multiple, different implementations. I'm keen to do a comprehensive comparative performance test, all the algorithms across different distributions, but it will take some time. I've also been thinking that the LSD radix-sort can be optimized (especially for large-width integers) where k <= (k_max >> 1). That is, where some of the most-significant digits are not being used, the number of column/digit sort passes with counting-sort can potentially be reduced. Anyway, maybe in version 2.0. :) Jeremy
Rob Stewart: I've incorporated your suggestion about how to handle the spread_sort wrapper and resolved the code issues you pointed to and some of the documentation ones (thanks). I'll leave the invisible doc formatting issues (partially due to the 3 year wait for review) alone until after this has been reviewed. Are you or someone else willing to formally review this? I've put it up on git with the fixes: https://github.com/spreadsort/algorithm_sorting Note that: integer_sort works on anything with a >> and a - operator that works on the result of the >>, or equivalent functors. float_sort works on anfything with an IEEE floating-point number key that can be cast to an integer type for sorting, as long as sign is handled correctly (float, double). string_sort works on any array of individual elements, where the individual elements sort as integers (such as string or wstring), or functors are provided with equivalent behavior. On Thu, Jul 25, 2013 at 9:43 PM, Jeremy Murphy < jeremy.william.murphy@gmail.com> wrote:
On 25 July 2013 20:15, Steven Ross
wrote: Actually, RAND_MAX = INT_MAX = (1 << 31) - 1 on my system (glibc-2.15). So how do I generate random 64-bit numbers... (rand() << 32) + rand()? Bah, I should just be using Boost.Random!
Sounds good.
Yes, I understand that's the assumption that bucket sort works on. So I guess that's also an argument for having multiple, different implementations. I'm keen to do a comprehensive comparative performance test, all the algorithms across different distributions, but it will take some time.
You might want to look at the example suite my tune.pl script runs (libs/algorithm/sorting/tune.pl). It tests a bunch of interesting distributions, including various types of random data, the ALRbreaker distribution (which shows the danger of conventional radix sorts), sorted data, and mostly sorted data.
I've also been thinking that the LSD radix-sort can be optimized (especially for large-width integers) where k <= (k_max >> 1). That is, where some of the most-significant digits are not being used, the number of column/digit sort passes with counting-sort can potentially be reduced. Anyway, maybe in version 2.0. :)
spread_sort has that optimization (it's useful for making the worst-case
condition hard to construct/much rarer). All you need to do is find the maximum unsigned value in a quick linear pass through the data, and you can ignore all bits higher than that value.
26.07.2013 7:37, Steven Ross:
I've put it up on git with the fixes: https://github.com/spreadsort/algorithm_sorting
FYI, some time ago I implemented verification of sorting - https://ideone.com/WeZU66 It generates sorted sequences of given size "isomorphic" to all other sorted sequences of that size. I.e. for N=3 it generates: 0 1 2 0 1 1 0 0 1 0 0 0 For each of such sorted sequences it loops thru all possible permutations and tests given sorting algorithm. I have just successfully tested boost::spread_sort and boost::integer_sort for N [0, 12). -- Evgeny Panasyuk
On Fri, Jul 26, 2013 at 3:49 PM, Evgeny Panasyuk
FYI, some time ago I implemented verification of sorting - https://ideone.com/WeZU66 It generates sorted sequences of given size "isomorphic" to all other sorted sequences of that size. I.e. for N=3 it generates: 0 1 2 0 1 1 0 0 1 0 0 0 For each of such sorted sequences it loops thru all possible permutations and tests given sorting algorithm. I have just successfully tested boost::spread_sort and boost::integer_sort for N [0, 12).
Thanks Evgeny.
On Jul 23, 2013, at 6:11 AM, Steven Ross
On Tue, Jul 23, 2013 at 5:05 AM, Rob Stewart
wrote: You're over-thinking this. I wasn't referring to UDTs. foo_int and foo<int> are different ways to spell a function targeting int. The latter happens to allow for use of the same identifier as the name, for different types, though it must be specialized.
Specializing is most convenient when argument type deduction does the work, but then overloading may do the same for non-templated functions, depending on the arguments.
I never thought of that before (using foo<int> to process something that isn't actually of int type, but acts like it, instead of foo_int as you mention). That's an interesting idea.
Perhaps it is, but that's not what I was saying. I was referring to foo<int> vs. foo<float>, for example, both having the name "foo".
I'm worried that it would scare off potential users, thinking that function only works on actual ints.
I was under the impression that your int function only worked on ints. That's the point of foo<int>.
My experience is that people usually care more about sort performance for UDTs than raw data types in operational code.
In that case, you need to figure out how to permit users to adapt their UDT to your function. Traits and the Bridge Pattern come to mind.
If you want to offer string behavior for UDTs, you can combine PTS, for known string types, with a fallback to a trait that indicates/bridges string behavior. Does anybody have a suggestion as to what this string trait might be?
It all depends upon your algorithm. The simplest scheme is to adapt all types the same way. You provide the specializations for the supported built-in types, with no primary specialization. Users specialize it for their UDT and they're off. There are other approaches possible, too.
Assuming there is such a trait, should I eliminate the fallback to std::sort for unidentified types?
You could keep that fallback, but one presumes choosing your algorithm was a conscious choice over std::sort(), so not providing a fallback seems better. ___ Rob (Sent from my portable computation engine)
Hi Marshall, I just thought I would draw you in here since Steven and I are interested in submitting implementations of radix-sort to the Algorithms library. I have a fairly simple textbook implementation of stable LSD radix-sort and Steven has an unstable MSD radix-sort using recursive bucket sort. (I am also playing around with an unstable radix-sort but it's not my focus.) I guess we're both looking for feedback on our work in order to progress it towards completion. But maybe in the first place, without requiring you to delve into our code, you might be able to comment on the theoretical usefulness of our ideas? In order to prevent any of us (mainly thinking about myself) from going too far down the rabbit hole? :) Thanks, cheers. Jeremy
participants (7)
-
Evgeny Panasyuk
-
Jeremy Murphy
-
Paul A. Bristow
-
Rob Stewart
-
Robert Ramey
-
Sourav
-
Steven Ross