On 22/09/2015 6:58, Francisco José Tapia wrote:
Hi Jon,
If the elements in the flat associative container are sorted, you need only to sort the elements to insert and after do a fusion of the two parts.
Yes, I understand that merge has better complexity guarantees than sort, sincethe already stored elements are sorted, it's better to sort the new elements and merge.
If your container admit repeated elements, you can use the std::sort, if not, you must use an stable sort , and delete the repeated.
I don't think I could use std::sort as is unstable, if there are repeated elements, I should only keep the first occurrence, so I need stability while sorting. In any case, I can't use standard algorithms as as I'd like some traits support to customize basic operations (swap, move, uninitialized_move) in order to use Boost.Move or future destructive move operations.
You have several algorithms for to do a stable sort. In the library we are developing (https://github.com/fjtapia/sort_parallel ) you have a smart_merge_sort, which is a very fast stable sort and use only one half of the memory used by the data, is exception safe and use the move semantics
Very nice! Is the N/2 memory requirement achieved with techniques similar to those described here? "Fast mergesort implementation based on half-copying merge algorithm" http://kicia.ift.uni.wroc.pl/algorytmy/mergesortpaper.pdf. Just some comments on your implementation, (I'm not a sorting expert, sorry if I'm mistaken), mainly related to "adaptability": Your implementation is not adaptive in the sense that a failure to allocate N/2 external memory is triggered with an exception. In contrast, many std::stable_sort implementations try to allocate N/4,N/8,etc... and they gracefully degrade to a O((log(N)^2) sort for sorting steps that have no enough external memory. Eg.: they use available memory to speed up apply mergesort for the "low level" phases (merging small ranges that could fit in the external memory). Only the final merge steps are done O((log(N)^2). This can be seen in the SGI implementation: https://www.sgi.com/tech/stl/stl_algo.h "inplace_merge" can also be adaptive, achieving O(N) if enough memory is available and ONlogN if no memory is is available.
I didn't see any implementation of the stable sort algorithms without additional memory, in any library GCC, TBB, IPP … I suppose because they are slow.
GCC also degrades to a no-memory sort. See method "__inplace_stable_sort" in include/bits/stl_algo.h. It uses "__merge_without_buffer" to implement mergesort. GCC's inplace_merge is also adaptive, using "__merge_without_buffer" when This "merge without buffer" is slower, no doubt. However, as I mentioned in other posts, there are some newer algorithms that claim similar performance to stable_sort without external memory. As a sort expert you might find them interesting, the implementation might be quite complicated: GrailSort: https://github.com/Mrrl/GrailSort, based on http://comjnl.oxfordjournals.org/content/35/6/643.full.pdf Wikisort: https://github.com/BonzaiThePenguin/WikiSort. Based on paper https://github.com/BonzaiThePenguin/WikiSort/blob/master/tamc2008.pdf I don't know how these algorithms work with non-integer types, but they are at least interesting.
If you have 100 elements and want to insert 30 elements, need to resize the vector to 130 elements, the last 30 are empty.
If you have other vector with the data to insert, you can use smart_merge_sort, and use as auxiliary memory the last 15 elements of the main vector. After sort, delete the repeated elements. And with the two vectors you must do a half_merge fusion , and all is done.
Sure. But I have no guarantee that input data is in another vector, the container just receives just a range of iterators I can't modify, just read. So I should: -> allocate a new buffer to store 130 elements -> Copy new elements to the back of the buffer (last 30 positions) -> Sort them using 15 elements of additional storage from that buffer. -> Merge the old buffer and the newly sorted elements in the first positions of the new buffer. -> Destroy last 30 positions of the new buffer. The problem is that we don't want to reallocate in every range input, only when capacity is filled. Extra memory is not small when the input is bigger than the stored element count. E.g.: You have a vector of 30 elements and range to insert has 10000 elements, I'd need to allocate a buffer of 15000 elements, extra 5000 elements only to do the initial sort. Obviously, your N/2 is much better than N. The ideal (not from a performance point of view, but from a "balanced" point of view) would be: Consider: N = size() //current flat_xxx size M = std::distance(first, last) //range to insert T = M+N If capacity() > N + M (vector is NOT reallocated) --------------------------------------------------- -> No new memory is allocated -> new elements are copied at the end of the already inserted elements. -> stable_sort in new elements passing the remaining capacity of the vector as a temporary buffer. stable_sort uses the extra capacity in some steps with complexity O(MlogM) sort. In the last steps, when no enough memory is available it uses a slower algorithm (the STL allows O(M(logM)^2)). If remaining capacity is >= M/2 mergesort will always be optimal. If GrailSort guarantees are true, O(MlogM) could be always obtained if stable_sort degrades to grailsort when no memory is available. -> A stable "inplace_merge" is used to merge the old elements (already sorted) with the newly sorted elements). Extra capacity is used in some steps. Complexity: T*log(T), O(T) if enough free memory (T) is available. The whole process' complexity ranges (depending on the available extra capacity) from: -> Worst case: O(M(logM)^2) + T*log(T). -> Best case: O(T) + O(MlogM) If capacity() < N + M --------------------------------------------------- A new buffer is allocated. new capacity: std::max(N,M/2) + M -> new elements are copied to the new buffer -> stable_sort is called using the extra memory of the new buffer. Complexity: best: O(MlogM). -> A merge is performed with the old elements. Complexity: O(T) Complexity : O(T) + O(MlogM) Best, Ion