Re: [boost] [devector] optimum growing/insertion policy

15 Oct 2017

      On 15/10/2017 17:59, Joaquin M López Muñoz via Boost wrote:
...
The assumption is of an ever growing sequence. That said, I recorgnize 
there are
scenarios not falling under this, most notably FIFO queues. I've given 
the policy
a little more thought and think we can twist it to also accommodate 
non-growing
scenarios, please keep on reading.
Thougths?
I think that the default (we could have other policies) should be simple 
and similar to vector/string/stable_vector/etc.... In this sense, 
capacity() and reserve(), if provided, should make sense. If we 
reallocate depending on the free capacity of one side, then capacity() 
and reserve() make no sense to the user

I think the user expects capacity() as the maximum number of insertions, 
in any position, that will be accepted before reallocating. Making 
free_capacity = capacity() - size(), less than, say, 
back_free_capacity() will be really hard to understand by users.

I propose a simple strategy:

Definitions:
---------

capacity() = the size of the buffer
free_capacity() = capacity() - size()
xxx_free_capacity() = the number of unconstructed items in the xxx (xxx 
= left or right) side of the buffer.
N = the number of items to be inserted
GF = growth factor

Strategy:
---------

1) devector is initialized with equal free capacity for both ends.
2) If a xxx (xxx = left or right) insertion is needed and 
xxx_free_capacity >= the number of items to be inserted, a xxx insertion 
is done (minimizes moves).
3) Otherwise if a xxx (xxx = left or right) insertion is needed and 
xxx_free_capacity < N, the sequence containing new plus old elements 
should be placed in the middle of the buffer. (Note: This step should be 
limited if amortized constant time is desired for push_xxx insertions)
4) Otherwise if the allocator supports in-place expansion, the buffer is 
expanded for max(size()+N, capacity()*GF) and if successful, steps 2 or 
3 are performed depending on the new xxx_free_capacity value.
5) Otherwise a new buffer is allocated and the new sequence containing 
new plus old elements should be placed in the middle of the buffer.

For uniformly distributed random insertions on both ends, very few extra 
movements will be done. This means it's amortized constant time, just 
like vector.

Worst case: for a repeated one-side insertion, when N/2 insertions are 
performed, the first relocation step will take place. In general, 
log(N/2) relocation steps (each one with a increasing number of moves) 
will be performed before reallocating a new buffer. I don't think it's 
amortized constant time, but amortized logarithmic time (but see below).

Extra moves can be reduced by imposing a capacity() that is a factor 
(say reallocation factor, RF) of the buffer size, forcing to 
reallocation when the load factor is high. This means that if size() is 
capacity() = RF*internal_buffer_size, even if there is room for the 
left/right insertions a reallocation will be performed. This means that 
capacity() = RF*internal buffer is known and means something to the 
user. Memory usage will grow, but data movement will be minimized. RF 
could be a policy-based parameter.

I think the RF factor obtains an amortized constant time for one side 
insertion if that property is considered useful. It's equivalent to 
reduce the number of relocations to a to a fixed number of steps, so the 
average movement to insert N elements in a devector of capacity C = N/RF 
is proportional to N.

For a quick example (number might be wrong, but you get the idea) of a 
repeated push_back:

If relocation steps are limited to 3, which happens with RF = 15/16, we 
need N = 15/16 C moves to insert new elements, plus C/2 + 3/4*C + 7/8*C 
moves to move already inserted elements. This means something like N + 
17/8*C = N + 17/8*N/RF = 49/15*N = 3,26*N moves.

A fast choice would be to allow just two relocations, a RF of 7/8 = 
(0,87) which would mean aprox. 2,6*N moves to push_back N elements. A RF 
of 3/4 (0.75, single relocation step) would mean 1,33*N moves to 
push_back N elements.

Statistics based policies seem more complex, and I don't know how well 
they will adapt to pattern changes thorough the lifetime of the 
container, plus they impose a size cost to the container. I wouldn't 
make the the default choice, as the average user should easily 
understand the default policy.

For a FIFO, I think a deque that reuses free segment capacity on one 
side and transfers it to the other side before allocating new memory 
would be the best choice. Currently boost::container::deque does not do 
this, but pull requests are welcome ;-)

Best,

Ion

Re: [boost] [devector] optimum growing/insertion policy

Ion Gaztañaga