Franciso
On Mon Dec 01 2014 at 1:52:00 PM Francisco José Tapia
The algorithms must be refined. The size limit for a thread is ( total_size / (nthreads *8) ). The number of 8 is arbitrary, run well, but I need more benchmarks for to obtain the best value.
That's reasonable, but the constant (8) should be somewhere easy to find and modify in case a different value is better on other systems.
The 1 thread algorithms must be improved, specially the merge_sort, the first solution can be copy the algorithm used in GCC, but I would like to examine more in deep.
Agreed. Don't be afraid to just fall back on the version in the standard library if you can't improve on it. I recommend doing that for the single-threaded part of the algorithm (unless you want to try my idea of replacing insertionsort with switch statement that selects between sorting networks, which will require a more careful performance review, but might be faster).
The detection about the sorted elements is easy in intro sort with a counter of movements, when you detect zero movements form one side to other, you can check if they are ordered. In the merge sort is more difficult, but I will examine.
That sounds reasonable.
The worst case in intro sort is extremely difficult to obtain. I checked with a counter when the algorithm use heap sort, and with hundred of millions elements, with different data inputs, sometimes the counter is to 1, and the number of elements sorted is small.
What partitioning algorithm are you using? Can't you design an input deliberately to mess up the partitioning, so only one or two items go into one of the two halves? Introsort is optimized to still perform decently in this worst case, and it'd be good to confirm that with worst-case data. About the size, intro sort don't use additional memory. Merge sort use an
array of a half of the size of the input data. This array is used, in the parallel and in the 1 thread function, each call have the range of elements and the related part of this array.
That sounds reasonable, though my understanding is that std::stable_sort is capable of running with less RAM than your merge_sort.
I have done other things about sorting which perhaps can be interesting. I am testing indirect sort. When the size of the elements grows, the data bus is stressed moving the elements and the performance of the algorithms drops. If you create an array with pointers to the elements, and sort this array, you have a penalty because you must access to the elements from a pointer, but you only move pointers. At end with the pointers sorted, a O(N) method sort the original elements. And with the size of the elements grows, this method is faster than the original.
Yes, indirect sorting is a well-known good technique. It's usually not difficult, but if you can create a general wrapper that makes it even easier to do efficiently, that would be interesting.
I hope have time this weekend for to pack and send the code. I don't know if as zip file or create a git repository for this code. If I can help you, please say me, and I will try. And if you want to use something say me for to write the documentation, because now I have only the code, test programs and several benchmarks.
If you can get the library fast enough that it's within 5% of the best competitor on both Linux and Windows for random, sorted, reverse-sorted, and mostly-sorted data (both integers and strings), and has reasonable worst-case performance, then I'll definitely be interested in taking a look.