Re: [boost] [sort] [container] Challenge for sorting experts, improved stable sort

22 Sep 2015

      Ion Gazta?aga wrote:
...
On 21/09/2015 20:53, Phil Endecott wrote:
...
Why do you (still) want to sort the whole container, rather than just
sorting the new items and then using inplace_merge?
Well, I actually want to have both merge and sort. And yes, if 
understand this correctly sort + merge would be faster.
In any case the idea is to have both (sort and merge) as adaptable 
algorithms, so that the extra capacity of the vector can be used to 
speed up the both sort and merge.
Have you considered making this general-purpose by adding a
feature to vector that exports an allocator, or a memory_resource,
that could be used by any code that needs a temporary buffer
while it can guarantee not to cause the vector to expand?

You could then modify inplace_merge, stable_sort etc. to take
such an allocator to use instead of get_temporary_buffer.  And
you could choose whether or not to use a fallback, i.e.
get_temporary_buffer, if the spare space in the vector is
insufficient.

(I'd want to see numbers convincing me of the practical usefulness
of this, though; I've worked on plenty of memory-constrained
projects and I've never been desperate enough to resort to this
sort of thing!)
...
For already sorted inputs and multikey 
containers, when size()+std::distance(begin, end) > capacity(),
I think you mean < capacity(), right?
...
then a 
usual O(N) merge can be done in the newly allocated memory. The latest 
version of flat containers (Boost 1.59) already does this.
...
The idea of using the allocated-but-unused part of the buffer for
temporary storage is an interesting one, which would benefit
inplace_merge as well as stable_sort.  Have you tried to quantify
the benefits?
Not yet, but if known algorithms are used, the difference between 
inplace_sort|merge and sort|merge is theoretically big. Merge becomes 
O(N) instead of O(NlogN). Sort becomes O(logN) instead of O(N(logN)^2).
I think you're misusing the term "inplace" in that paragraph.  I think
you mean "with no additional memory".  "inplace" just means that the
result ends up back where the input was.  All std:: sorts are inplace.

std::sort is never O(N(logN)^2)  (or O(logN), but I presume that's a
typo!).  std::stable_sort is allowed be, but see below.
...
When mergingsorting a range bigger than the extra capacity, steps up the 
the extra capacity would be O(N) and the last merge step would be 
N*logN. There are some mergesort implementations that only require N/2 
extra memory, which really makes the extra capacity more attractive:
"Fast mergesort implementation based on half-copying merge algorithm"
http://kicia.ift.uni.wroc.pl/algorytmy/mergesortpaper.pdf.
When growing the vector, the extra capacity is sometimes quite big 
(depending on the growth factor). I think speedups will be noticeable. 
Obviously, nothing is real until it's measured.
There are C++ implementations of stable sort algorithms which claim they 
beat std::stable_sort, only require O(1) extra memory (although I guess 
it's really O(logN) extra memory if recursion is used), and still 
guarantee O(N*log(N)) complexity instead of O(N*(logN)^2). Based on 
these papers:
http://www.math.ntua.gr/~symvonis/publications/c_1994_S_Optimal%20Stable%20M...
https://github.com/BonzaiThePenguin/WikiSort/blob/master/tamc2008.pdf
http://comjnl.oxfordjournals.org/content/35/6/643.full.pdf
Right, stable sorts that are "almost" O(N log N) are known; they work
with blocks of size sqrt(N).  I say "almost" because they have problems
when more than sqrt(N) items are equal to each other.  I'm not sure if
that is solved.  But these algorithms tend to have a constant factor of
maybe 5 or 7 compared to unstable sorts or sorts that use extra memory,
and I don't think I've ever seen one used anywhere.  I guess it's because
of this that std::stable_sort is not required to be O(N log N).
...
In any case those implementations are mainly tested with integers, they 
use memcpy so work only with PODs and completely ignore exception 
safety. But fixing that seems quite easier than writing a new algorithm ;-)
Yes, I don't think there should be any fundamental issues with making
any of these sqrt(N) block-based algorithms more "C++".

Cheers,  Phil.