Hi Francisco,
On Mon, Jun 8, 2015 at 4:44 PM Francisco José Tapia
I changed the name of the examples. Now are :
- example_single_thread - example_default_threads - example_100_threads
Thanks!
I will prepare a sort benchmark for to run in 5 minutes. I will do a new program, because if begin to include options in the programs, can be long and can be confuse for the users.
<= 5 minutes. If you can create a consistent benchmark that runs in a minute, even better, but I've usually found really short benchmarks (<1 minute) tend to be noisy.
Timsort in Windows, sort the data, but the time and memory used are much greater than the used in Linux. I think the best option is to remove of the Windows benchmarks.
Sounds good to me.
I will examine the licenses, and I think only need to specify the license of each part of the code because I only use, don't modify.
If you're willing to give me a link for all the licenses for software you haven't written but are including along with the code, I'd appreciate it, so I can double-check.
The benchmarks programs are different in Windows and Linux, because the algorithms compared are different. In Windows don't have the GCC algorithms, and in Linux don't have the PPL algorithms.
Ok; please try to share everything that's practical.
About the int_array objects. The constructor and the operator = function with an 64 bits integer as parameter, are functions used before, but don't used now. The operator << is the same.
Please delete them then. They confused me, and I don't want to maintain or answer questions about any unused code.
The reason for to implement a complex comparison function, is because in a sort algorithm the two main functions are move and compare. I want to implement the worst case, big objects hard to move and hard to compare.
I'm ok with that, but please mention it in your documentation if you don't already.
About the optimization of the code I use the -O3 optimization parameter for all the code. The TBB parallel stable sort have 4 different versions ( Open-MP, CilkPlus, TBB high level and TBB low level), the fastest is TBB low level and due this is the used in the benchmarks.
In the attached file (parallel_objects.txt) you can see the results in my machine with all kind of data (sorted, reverse sorted, random, random %50000000, random % 10000, and equal elements).
Those numbers look good! If the numbers on my test system (dual-boot Windows and Linux) look anything like that, I say it's good enough to include. In particular, I appreciate that you added the already-sorted optimization; that's a surprisingly common case. I've seen duplicate sorts (they sorted the data 2 times, every time) deployed to hundreds of thousands of computers; nobody noticed until I looked because the sort routine had an optimization for already-sorted data, and the duplicate sort only took 1% of the total sort time.
About to use the code for to parallelize spreadsort, feel free for to use. And if you need help , need any question or have any problem, please, say me.
When you decide to do , say me and I will prepare a description of the two approach used in order to decide the best for your code, and how to implement. Some things are complex to understand reading only the code.
Not today. Let's get your algorithms debugged, documented, and into Boost before we try anything that complicated. Please let me know when you're done with your updates and are ready for me to review the documentation and code.