[boost] [sort][container] Challenge for sorting experts, an improved stable sort

20 Sep 2015

      Hi,

I'm trying to improve insertion times for flat associative containers. 
In range insertions one optimization is to insert the whole range in the 
end of the container and sort the whole vector instead of trying a loop.

An additional optimization proposed in the mailing list is to have an 
option to create externally the internal vector type of the flat 
map/set, fill it and then try to move it into the flat container. That 
requires sorting the moved vector (and removing duplicates if needed).

I could try to use std::sort/std::stable_sort but there are some problems:

-> Non-portable for C++03 compilers and Boost.Move, and Boost.Container 
fully supports those platforms.

-> Additional O(N) temporary memory for stable sorting. Usually this 
means a temporary N element array, although there are algorithm to use 
only N/2 temporary elements. With big containers, this is a lot of memory.

So I started analyzing several papers and implementations to see if a 
custom implementation was possible, but I felt Boost.Sort could be a 
good place to have that implementation and the Boost community a good 
place to discuss design alternatives.

My initial requirements:

1) Stable sort. A non-stable sort is faster and useful for unique keys 
containers, but stable sort can be used initially in all flat containers 
until the non-stable version is implemented.

2) Externally supplied auxiliary memory with optionally no auxiliary 
memory at all. It seems that there are newer algorithms like Grailsort 
(https://github.com/Mrrl/GrailSort) or WikiSort 
(https://github.com/BonzaiThePenguin/WikiSort) that claim O(1) memory, 
even no auxiliary buffer, and key comparison and data exchanges with 
complexity O(n*log(n)). However I don't think those implementation 
correctly handle the basic exception guarantee or non-trivially copyable 
types.

Reason: I want to avoid allocating short-lived temporary memory: if it's 
available I want to pass the extra capacity that flat containers' 
internal vector holds for future insertions.

3) Customization points (without ADL, which IMHO is not customizable 
enough). The sort algorithm should use a SortTraits template parameter 
for basic operations like "move", "swap", "initialized_move". E.g:

template<class T>
struct sort_traits
{
    static? void move(T &from, T &to);
    static? void swap(T &l, T &r);
    static? void uninitialized_move(T &from, void *to);
    static? void destroy(T &from, void *to);
};

I don't know if this trait should be stateful or not (static members). I 
could imagine the usefulness of a stateful implementation of sort_traits 
that could store a reference to an allocator so that some special 
"construct" version is used in some operations. Or just a version that 
counts the number of each operation to ease debugging. In any case, a 
stateless version would be enough for most cases.

Reason: In Boost.Container, the idea is to use boost::move + 
boost::adl_move_swap in the initial implementation, maybe some 
destructive move implementation in a second iteration if Boost.Move 
implements it. In any case those operations should be customizable by 
the user, independently from the type to be sorted. This also avoids 
unnecessary dependencies in Boost.Sort.

-> Dependencies: the implementation should only rely on Boost.Core, 
Boost.Config, maybe Boost.TypeTraits and no heavy STL dependencies 
(optionally). Currently Boost.Sort depends on Serialization for 
BOOST_STATIC_WARNING, but I think this is easily solvable, maybe this 
static assert utility should go to the core/utility library.

Reason: Sort is a low-level algorithm that should avoid pulling unneeded 
dependencies to other boost libraries.

* * *

I don't know if other Boost libraries could benefit from this new 
stable_sort implementation. Maybe it's too special and I should try to 
write it myself in Boost.Container.

I would love to hear comments and suggestions from sort experts, this 
might be a common need that could enrich the Boost.Sort library.

Best,

Ion

[boost] [sort][container] Challenge for sorting experts, an improved stable sort

Ion Gaztañaga