Hi Francisco,
In my opinion, sort::parallel is better leave in the actual place. When finish the process with sort::parallel, I have intention to move my other library CounterTree to the Incubator.
Sounds good to me. Thanks for changing your examples to run off of files! That's easier to test and debug. Please provide your examples with a name of what they do, not just 1, 2, 3. I suggest setting up all your benchmarks to run within 5 minutes by default, and have a flag to do the full-size test. Feel free to rip out timsort if it causes you any trouble; I'd only mentioned it because it's a popular sort. Please make sure the license allows you to include it in Boost.
I had seen several grammatical mistakes in the documentation, due to the fatigue, the excess of cut and paste and my lack of English grammar.
There is a mistake in the sort::parallel documentation: "These algorithms are generic, can sort any kind of object using a comparison object, in contrast with the specialized algorithms, as spreadsort, which only sort integers, strings and float values in a single thread version, but with a speed greater than the generic sort algorithms." spreadsort can sort any data for which you can define a strict weak ordering (just like std::sort). See https://github.com/boostorg/sort/blob/master/example/generalizedstruct.cpp for a moderately complicated example. What's different about purely comparison-based algorithms vs spreadsort is that only a comparison object is required for purely-comparison-based algorithms, where spreadsort requires indexing functions into the key (such as bitshift or the equivalent of [] for strings). If you can write a comparison function, you can write an equivalent indexing function, though that doesn't mean it's easy. Many people have made the same assumption as in your parallel_sort documentation, and I'm not sure what the best way to clarify it is. I think it'd be interesting to try a parallel spreadsort using your parallelization approach, if your library gets substantial usage. Do you need to have separate windows and linux cpp files? Much of the benchmark code appears duplicated, and that becomes a maintenance hassle over time. Please share as much of the code as you can. I have some questions about your int_array object: 1) Why does it have these two functions: int_array ( uint64_t K =0 ); int_array & operator = ( uint64_t K); 2) Why doesn't this print out the entire contents of the array: std::ostream & operator << ( std::ostream & out , const int_array<NN> & N) 3) The object has a fairly expensive comparison function (there is no early exit, it's like a hash comparison), it doesn't seem like a problem, but I'm curious why you picked that comparison approach? I look forward to reviewing it once you've completed cleaning it up (including your docs). Did you include the tbb optimization to be very fast with already-sorted data (I haven't run the library to check)? I assume all your algorithms are either competitive in speed or better in memory than their tbb and std library equivalents.
I generate the doxygen documentation for all the code, but I think is very big and unnecessary for the final user of the library. I will change and put the documentation of the final functions.
They only need to see what is outside your detail namespace.