Re: [boost] Fwd: [sort][container] Challenge for sorting experts, an improved stable sort

23 Sep 2015

      On 22/09/2015 6:58, Francisco José Tapia wrote:
...
Hi Jon,
If the elements in the flat associative container are sorted, you need only
to sort the elements to insert and after do a fusion of the two parts.
Yes, I understand that merge has better complexity guarantees than sort, 
sincethe already stored elements are sorted, it's better to sort the new 
elements and merge.
...
If your container admit repeated elements, you can use the std::sort, if
not, you must use an stable sort , and delete the repeated.
I don't think I could use std::sort as is unstable, if there are 
repeated elements, I should only keep the first occurrence, so I need 
stability while sorting. In any case, I can't use standard algorithms as 
as I'd like some traits support to customize basic operations (swap, 
move, uninitialized_move) in order to use Boost.Move or future 
destructive move operations.
...
You have several algorithms for to do a stable sort. In the library we are
developing (https://github.com/fjtapia/sort_parallel ) you have a
smart_merge_sort, which is a very fast stable sort and use only one half of
the memory used by the data, is exception safe and use the move semantics
Very nice! Is the N/2 memory requirement achieved with techniques 
similar to those described here?

"Fast mergesort implementation based on half-copying merge algorithm"
http://kicia.ift.uni.wroc.pl/algorytmy/mergesortpaper.pdf.

Just some comments on your implementation, (I'm not a sorting expert, 
sorry if I'm mistaken), mainly related to "adaptability":

Your implementation is not adaptive in the sense that a failure to 
allocate N/2 external memory is triggered with an exception. In 
contrast, many std::stable_sort implementations try to allocate 
N/4,N/8,etc... and they gracefully degrade to a O((log(N)^2) sort for 
sorting steps that have no enough external memory. Eg.: they use 
available memory to speed up apply mergesort for the "low level" phases 
(merging small ranges that could fit in the external memory). Only the 
final merge steps are done O((log(N)^2). This can be seen in the SGI 
implementation:

https://www.sgi.com/tech/stl/stl_algo.h

"inplace_merge" can also be adaptive, achieving O(N) if enough memory is 
available and ONlogN if no memory is is available.
...
I didn't see any implementation of the stable sort algorithms without
additional memory, in any library GCC, TBB, IPP … I suppose because they
are slow.
GCC also degrades to a no-memory sort. See method 
"__inplace_stable_sort" in include/bits/stl_algo.h. It uses 
"__merge_without_buffer" to implement mergesort. GCC's inplace_merge is 
also adaptive, using "__merge_without_buffer" when

This "merge without buffer" is slower, no doubt. However, as I mentioned 
in other posts, there are some newer algorithms that claim similar 
performance to stable_sort without external memory. As a sort expert you 
might find them interesting, the implementation might be quite complicated:

GrailSort: https://github.com/Mrrl/GrailSort, based on 
http://comjnl.oxfordjournals.org/content/35/6/643.full.pdf

Wikisort: https://github.com/BonzaiThePenguin/WikiSort. Based on paper 
https://github.com/BonzaiThePenguin/WikiSort/blob/master/tamc2008.pdf

I don't know how these algorithms work with non-integer types, but they 
are at least interesting.
...
If you have 100 elements and want to insert 30 elements, need to resize the
vector to 130 elements, the last 30 are empty.
If you have other vector with the data to insert, you can use
smart_merge_sort, and use as auxiliary memory the last 15 elements of the
main vector. After sort, delete the repeated elements. And with the two
vectors you must do a half_merge fusion , and all is done.
Sure. But I have no guarantee that input data is in another vector, the 
container just receives just a range of iterators I can't modify, just 
read. So I should:

-> allocate a new buffer to store 130 elements
-> Copy new elements to the back of the buffer (last 30 positions)
-> Sort them using 15 elements of additional storage from that buffer.
-> Merge the old buffer and the newly sorted elements in the first 
positions of the new buffer.
-> Destroy last 30 positions of the new buffer.

The problem is that we don't want to reallocate in every range input, 
only when capacity is filled. Extra memory is not small when the input 
is bigger than the stored element count. E.g.: You have a vector of 30 
elements and range to insert has 10000 elements, I'd need to allocate a 
buffer of 15000 elements, extra 5000 elements only to do the initial 
sort. Obviously, your N/2 is much better than N.

The ideal (not from a performance point of view, but from a "balanced" 
point of view) would be:

Consider:

N = size()  //current flat_xxx size
M = std::distance(first, last) //range to insert
T = M+N

If capacity() > N + M (vector is NOT reallocated)
---------------------------------------------------

-> No new memory is allocated

-> new elements are copied at the end of the already inserted elements.

-> stable_sort in new elements passing the remaining capacity of the 
vector as a temporary buffer. stable_sort uses the extra capacity in 
some steps with complexity O(MlogM) sort. In the last steps, when no 
enough memory is available it uses a slower algorithm (the STL allows 
O(M(logM)^2)). If remaining capacity is >= M/2 mergesort will always be 
optimal. If GrailSort guarantees are true, O(MlogM) could be always 
obtained if stable_sort degrades to grailsort when no memory is available.

-> A stable "inplace_merge" is used to merge the old elements (already 
sorted) with the newly sorted elements). Extra capacity is used in some 
steps. Complexity: T*log(T), O(T) if enough free memory (T) is available.

The whole process' complexity ranges (depending on the available extra 
capacity) from:

-> Worst case: O(M(logM)^2) + T*log(T).
-> Best case: O(T) + O(MlogM)

If capacity() < N + M
---------------------------------------------------
A new buffer is allocated. new capacity: std::max(N,M/2) + M
-> new elements are copied to the new buffer
-> stable_sort is called using the extra memory of the new buffer. 
Complexity: best: O(MlogM).
-> A merge is performed with the old elements. Complexity: O(T)

Complexity : O(T) + O(MlogM)

Best,

Ion

Re: [boost] Fwd: [sort][container] Challenge for sorting experts, an improved stable sort

Ion Gaztañaga