[sort] Parallel sorting sub-library mini-review starts tomorrow, 11 November
Dear Boost community, The mini-review of Francisco Tapia's Parallel sorting library begins tomorrow November 11, and ends November 20th. The purpose of this review is to assess whether the sub-library is useful and up to Boost software standards. If the Boost community agrees on both, Francisco and I will integrate it into the existing Boost.sort library as a separate sub-library from Spreadsort. In doing so we'll integrate the documentation with the existing documentation for Boost.Sort, making it more consistent. If you review Francisco's library, please make sure you're using a compiler that supports C++11, and answer each of these questions: 1. Were you able to run it and see accurate and fast results? Please specify your compiler, OS, and processor type, and any problems (or slowdowns) encountered. 2. Would you use this library if accepted? Why or why not? 3. What is your evaluation of the design? 4. What is your evaluation of the implementation? 5. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? The library is here: https://github.com/fjtapia/sort_parallel This library provides both stable and unstable sorting algorithms, in single threaded and parallel versions. - All the algorithms have average and worst case runtime of O(NLogN). - These algorithms do not use any other library or utility. To compile and run these you will need a *C++11 compliant compiler.* - *This library is *include only, only requiring* the files in the boost/sort/parallel folder, with no external dependencies.* - The algorithms use a comparison object, in the same fashion as std::sort, defaulting to the std::less object, which uses the < operator internally for comparisons. - The algorithms are exception safe, meaning that the exceptions generated by the algorithms guarantee the integrity of the objects to sort, but not their relative order. If the exception is generated inside the objects (in the move or in the copy constructor) the results can be unpredictable. The library has a a new parallel_sort algorithm *( internally named Block Indirect)*, for processors connected with shared memory*,* invented for the library. This new algorithm combines the speed of algorithms like the GCC parallel sort (based on OpenMP), or the Microsoft parallel buffered sort, with the small memory consumption of algorithms like the TBB sort and the Microsoft parallel Sort. Results from running on an I7 5820 with 12 threads: GCC parallel_sort 1,25 secs 1560 MB TBB parallel_sort 1,64 secs 783 MB Boost parallel_sort 1,08 secs 786 MB In a separate repository (because it uses licence-restricted software like TBB and Microsoft parallel sort, and can’t be included in the Boost library), you can find all the code needed to run the benchmarks, the library code, the benchmark programs, and shell scripts to compile and run it on your machine in Linux and Windows. For the Linux benchmarks you will need to install TBB. (https://github.com/fjtapia/sort_parallel_benchmark) We look forward to your feedback!
On Thu, Nov 10, 2016 at 8:25 PM, Steven Ross
The library is here: https://github.com/fjtapia/sort_parallel
I was unable to build using Visual Studio 2015 with Update 3. First I cloned the repository into the root of my boost 1.62.0 directory. Then from a mSysGit bash prompt I `cd` to "test/" and invoke `b2`. Here is some of the output: test_insertion_sort.cpp(97): error C2146: syntax error: missing ')' before identifier 'and' test_insertion_sort.cpp(114): error C2146: syntax error: missing ')' before identifier 'and' Is "and" standard C++? Most of the errors revolve around that keyword.
On Fri, 11 Nov 2016, Steven Ross wrote:
Dear Boost community,
The mini-review of Francisco Tapia's Parallel sorting library begins tomorrow November 11, and ends November 20th. The purpose of this review is to assess whether the sub-library is useful and up to Boost software standards. If the Boost community agrees on both, Francisco and I will integrate it into the existing Boost.sort library as a separate sub-library from Spreadsort. In doing so we'll integrate the documentation with the existing documentation for Boost.Sort, making it more consistent.
If you review Francisco's library, please make sure you're using a compiler that supports C++11, and answer each of these questions:
1. Were you able to run it and see accurate and fast results? Please specify your compiler, OS, and processor type, and any problems (or slowdowns) encountered. 2. Would you use this library if accepted? Why or why not? 3. What is your evaluation of the design? 4. What is your evaluation of the implementation? 5. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
Sorry, I won't have time to do a full review. I only quickly tested it in my application with g++-6 on x86_64-linux and it just worked :-) I believe I would use it, as an alternative to TBB (I would keep both code paths). Both because of the performance, and because of the lack of dependency: we already depend on C++11 and boost (I don't remember if adding -pthread is still necessary to use std::thread, that might complicate things a bit).
*This library is *include only, only requiring* the files in the boost/sort/parallel folder, with no external dependencies.*
This is an advantage for simplicity, but also a disadvantage in that you cannot easily benefit from sharing a thread pool with other parallel constructs, from a supervisor that adapts the number of threads to whatever else is going on, from any other convenient tool from a framework (omp would let you limit the number of threads with an environment variable, etc).
Results from running on an I7 5820 with 12 threads:
I did a quick test on a dual-core laptop, and this was noticably faster than TBB :-)
GCC parallel_sort
Note that gcc's parallel sort has an extra parameter to chose between several algorithms, it would be nice to mention that you tested the non-default ones as well. On Fri, 11 Nov 2016, Vinnie Falco wrote:
On Thu, Nov 10, 2016 at 8:25 PM, Steven Ross
wrote: The library is here: https://github.com/fjtapia/sort_parallel
I was unable to build using Visual Studio 2015 with Update 3. First I cloned the repository into the root of my boost 1.62.0 directory. Then from a mSysGit bash prompt I `cd` to "test/" and invoke `b2`. Here is some of the output:
test_insertion_sort.cpp(97): error C2146: syntax error: missing ')' before identifier 'and' test_insertion_sort.cpp(114): error C2146: syntax error: missing ')' before identifier 'and'
Is "and" standard C++? Most of the errors revolve around that keyword.
Yes, 'and' is standard in C++, but MS only has it in some compilation modes or if you include <ciso646> first. I think boost should not use those alternative tokens. -- Marc Glisse
To Vinnie,
The Visual Studio have a problem with and or , xor, and need to include the
file ciso646. In others compilers, this file exist but is empty, because
the compiler accept them.
The code is correct, because I compiled and run benchmarks in Windows and
Linux, without problem. The only problem are in the test files. They need
to include ciso646. I included in all the programs, and run with b2, and in
my machine, they run correctly without errors. I compiled again with GCC
5.2 , and as expected, compile and run.
The files corrected are uploaded to the repository.
To Marc,
about pthread, it’s only a library. If you do a static compilation , don’t
need pthread for run the programs.
In all the parallel algorithms, you have a parameter with the number of
threads to use, by default is std::thread::hardware_concurrency ( ), and
depend of the HW where execute the program. If the value passed is 0, the
algorithm uses 1 thread.
This parameter is not fixed in the compilation time, it’s a variable. The
algorithm check when begin the execution, and configure and adapt the
algorithm to the number of threads.
If you want to check easily in Windows and Linux, in a separate repository
(because it uses license-restricted software like TBB and Microsoft
parallel sort, and can’t be included in the Boost library), you can find
all the code needed to run the benchmarks, the library code, the benchmark
programs, and shell scripts to compile and run For the Linux benchmarks you
will need to have installed TBB.
(https://github.com/fjtapia/
https://github.com/fjtapia/sort_parallel_benchmark)
(https://github.com/fjtapia/sort_parallel_benchmark)
https://github.com/fjtapia/sort_parallel_benchmark
Thanks by your interest
Francisco
2016-11-11 16:13 GMT+01:00 Marc Glisse
On Fri, 11 Nov 2016, Steven Ross wrote:
Dear Boost community,
The mini-review of Francisco Tapia's Parallel sorting library begins tomorrow November 11, and ends November 20th. The purpose of this review is to assess whether the sub-library is useful and up to Boost software standards. If the Boost community agrees on both, Francisco and I will integrate it into the existing Boost.sort library as a separate sub-library from Spreadsort. In doing so we'll integrate the documentation with the existing documentation for Boost.Sort, making it more consistent.
If you review Francisco's library, please make sure you're using a compiler that supports C++11, and answer each of these questions:
1. Were you able to run it and see accurate and fast results? Please specify your compiler, OS, and processor type, and any problems (or slowdowns) encountered. 2. Would you use this library if accepted? Why or why not? 3. What is your evaluation of the design? 4. What is your evaluation of the implementation? 5. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
Sorry, I won't have time to do a full review. I only quickly tested it in my application with g++-6 on x86_64-linux and it just worked :-)
I believe I would use it, as an alternative to TBB (I would keep both code paths). Both because of the performance, and because of the lack of dependency: we already depend on C++11 and boost (I don't remember if adding -pthread is still necessary to use std::thread, that might complicate things a bit).
*This library is *include only, only requiring* the files in the
boost/sort/parallel folder, with no external dependencies.*
This is an advantage for simplicity, but also a disadvantage in that you cannot easily benefit from sharing a thread pool with other parallel constructs, from a supervisor that adapts the number of threads to whatever else is going on, from any other convenient tool from a framework (omp would let you limit the number of threads with an environment variable, etc).
Results from running on an I7 5820 with 12 threads:
I did a quick test on a dual-core laptop, and this was noticably faster than TBB :-)
GCC parallel_sort
Note that gcc's parallel sort has an extra parameter to chose between several algorithms, it would be nice to mention that you tested the non-default ones as well.
On Fri, 11 Nov 2016, Vinnie Falco wrote:
On Thu, Nov 10, 2016 at 8:25 PM, Steven Ross
wrote: The library is here: https://github.com/fjtapia/sort_parallel
I was unable to build using Visual Studio 2015 with Update 3. First I cloned the repository into the root of my boost 1.62.0 directory. Then from a mSysGit bash prompt I `cd` to "test/" and invoke `b2`. Here is some of the output:
test_insertion_sort.cpp(97): error C2146: syntax error: missing ')' before identifier 'and' test_insertion_sort.cpp(114): error C2146: syntax error: missing ')' before identifier 'and'
Is "and" standard C++? Most of the errors revolve around that keyword.
Yes, 'and' is standard in C++, but MS only has it in some compilation modes or if you include <ciso646> first. I think boost should not use those alternative tokens.
-- Marc Glisse
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman /listinfo.cgi/boost
The Boost.Sort parallel sorting sub-library mini-review is now complete. There was interest expressed in using this library, and all complaints about its ability to work were corrected. For that reason, I'm going to accept it into the Boost.Sort library, subject to these conditions: 1. Francisco should investigate the reverse-sorted data optimization and see if he can integrate it without significantly impacting performance for other data sets. 2. Francisco should investigate why Asynchronous quick_intro_sort outperforms Boost parallel sort, and consider adopting the techniques or source of that library call with Christophe's permission. 3. We need to integrate the documentation with the rest of Boost.Sort so that it works properly with QuickBook, and the documentation will need some editing. 4. Francisco will maintain the Parallel sorting sub-library. Thanks for all the feedback from participants, and Congratulations Francisco!
Thanks Steven,
I am agree with all the points.
I would like to know the memory used and the best,normal and worst case of
the Asynchronous sorting algorithms.
I must study in deep QuickBook for unify the documentation of the project
Thanks again
Francisco
2016-11-22 3:16 GMT+01:00 Steven Ross
The Boost.Sort parallel sorting sub-library mini-review is now complete. There was interest expressed in using this library, and all complaints about its ability to work were corrected. For that reason, I'm going to accept it into the Boost.Sort library, subject to these conditions:
1. Francisco should investigate the reverse-sorted data optimization and see if he can integrate it without significantly impacting performance for other data sets. 2. Francisco should investigate why Asynchronous quick_intro_sort outperforms Boost parallel sort, and consider adopting the techniques or source of that library call with Christophe's permission. 3. We need to integrate the documentation with the rest of Boost.Sort so that it works properly with QuickBook, and the documentation will need some editing. 4. Francisco will maintain the Parallel sorting sub-library.
Thanks for all the feedback from participants, and Congratulations Francisco!
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost
participants (4)
-
Francisco José Tapia
-
Marc Glisse
-
Steven Ross
-
Vinnie Falco