Re: [boost] [review] The review of Boost.DoubleEnded starts today: September 21 - September 30

4 Oct 2017

      Hi Ion,

Thanks for the detailed review. Some minor comments follow.
...
The library presents two containers, devector and batch_deque. I find 
the first one as a relatively simple but interesting vector extension 
supporting front insertions. In the future it could maybe take advantage 
of the backwards memory expansion (taking the previously adjacent free 
memory segment so that new elements could be added to front without 
moving the rest of the elements) supported by Boost.Interprocess and the 
Boost.Container’s modified DLMalloc.
Interesting idea.
...
----------
devector
--------
1) From a design point of view, I would prefer not to include the small 
buffer optimization with the data structure, although reviewing code, it 
seems many parts are to be optimized.
Everything should be removed by the optimizer; if not, then that section 
of code needs to be amended with a constant expression.
...
It's like just having small_vector 
without having vector, most libraries have two classes for these two use 
cases. Different classes also ease that a type erasured base class can 
be defined independent from the static storage size (Boost.Container 
calls it small_vector_base, LLVM calls it SmallVectorBase)  I think we 
should have a devector, and add small_devector and static_devector 
containers if there is demand.
Can they share some of the implementation, or are we left with a double 
(or triple) implementation effort?
...
2) Defaulting size_type to unsigned int is arbitrary and against 
existing practice.
We didn't do that as an arbitrary decision.
...
According to the usual STL rules, size_type shall 
come from the allocator, which knows what type can represent the number 
of elements that can be allocated.
I have no problems with STL rules, but it as you point out, even 
Boost.Container deviates from several of those.
...
3) reserve_only_tag: I would rename it as “reserve_only_t” (similarly to 
std::allocator_arg_t and Boost.Container’s “default_init_t”). The front 
and back capacity version should be enough as I think  devector should 
not show preference for back insertion (if a user wants only a back 
capacity, then it would use a vector and not a devector).
So you would prefer it to divide the buffer in two approximately equal 
chucks? Or that one has to specify each chunk size explicitly each time
(so full symmetry, and the one-sided constructor gone)?
...
Strictly 
speaking, capacity constructors are a bit redundant, but the same could 
be said about default constructing constructors:
//it could be implemented as default constructor + resize
explicit vector(size_type n, const Allocator & allocator = Allocator());
Yes, just like push_back is redundant when we have insert( iter, x ) :-)
...
A reserve-only constructor plus an “unsafe” emplacement could make a 
container similar to an array, where no redundant checks are done when 
filling the array. But I don’t expect noticeable performance gains with 
this reserve-only constructor.
No, it is for convenience.
...
4) unsafe_uninitialized_tag: I don’t like these non-safe constructors, 
which are  the cases?. With reserve-only constructors + unsafe emplace 
back a user can achieve nearly the same performance. For POD types (like 
the socket example mentioned in the documentation), I would suggest 
something like Boost.Container’s “default_init_t”. Instead of 
“unsafe_uninitialized_resize_front” I suggest resize_front(size_type, 
default_init_t)
I didn't know about these default_init_t when we started in 2015. They 
have a good design IMO. But unfortunately they don't get rid of double 
initialization for non-trivial types. So maybe both

resize_front( size_type, default_init_t )
resize_front( size_type, uninitialized_t )

would be an idea.
...
5) reserve() / resize(): I think they have no sensible meaning in the 
container (capacity() could be useful, depending on the reallocation 
strategy). We should not treat back insertion as the “default” 
insertion. IMHO the API of devector should be symmetrical.
Symmetry is good, but what about generic code?
...
8) Regarding "move_if_noexcept", Boost.Container does not use it as it 
consider a pessimized design for a situation (throwing moves) that will 
occur nearly never 
[snip]
devector itself would be copied if 
small_buffer_size is used, etc.).
only when it stores elements with a non-noexcept move though.
...
9) Minor comment 2: IMHO I think iterators would be better as classes 
instead of pointers, a user can always use data() to make sure pointers 
are returned. Many times unwanted overload resolutions can happen if 
pointers are used, so I prefer the type safety offered by a class-based 
iterator. It can also offer checked iterators in the future.
I think you have a way to turn these into pointers again in 
Boost.Container, By defining some preprocessor symbol, right? Having 
pointers matters when interfacing with other types, because this is the 
only case where the iterators are seen as contiguous.

I would much rather see that in debug mode the iterator type is a class 
and in release mode it is a pointer compared to having extra 
preprocessor symbols.
...
2) Segment_iterators: Raw pointers are stored in segment iterators. That 
means that those could not be placed in the memory that the allocator 
allocates, if the allocator does not defined ::pointer as T*.
There is definitely some work relating to allocators and such.
...
Also, it seems strange that there is not segment_type/segment_info 
class. The iterator itself has members that return the address and the 
size of a segment. I think that’s unexpected, the expected segment 
iterator would return a reference to a class (e.g. segment_type) that 
represents the segment,
I think we discussed that design at some point. Benedek, can you 
remember why we went in the other direction?
...
3) stable_insert: I can’t see the usefulness of this operation. A 
batch_deque is a sequence, but the user cares about the insertion point 
but does not care if additional elements are created somewhere? I fail 
to see a real use case.
A deque is a sequence, sure, but also a container with reference 
stability. I'm curious, when you use Boost.Intrusive, where do store the 
elements?
...
--------------------------
devector
--------------------------
emplace_slow_path: The forced temporary creation should be avoided, it 
is overkill for containers of containers, devector<static_vector> or 
containers with sentinel nodes. IMHO supporting references to already 
included elements is a bad feature of std::vector (a corner case 
supported in single element insertion, ignored for ranges, etc.) 
Boost.Container chose not to support it and no one has demanded that 
feature:
http://www.boost.org/doc/libs/1_65_1/doc/html/container/Cpp11_conformance.ht...
Yeah, I remember heated discussion about this in past. I'm not a fan of 
that feature of the STL containers.
...
Reallocation strategy: First, I considered this strategyunnatural, but 
now I am not sure. For a push_front, if no free front capacity is found, 
but there is free back capacity, then all data is shifted to insert the 
element. If interleaved front and back insertions are performed, I don’t 
know if this will lead to too many data movements. When memory is 
reallocated to make room for a front insertions, the free back capacity 
is unchanged and all new free capacity goes to front. This means that 
the last operation (front or back) determines where all the new capacity 
goes. I find a bit strange that N == front_back_capacity() does not mean 
that a reallocation will happen if we push_back N times, as repeated 
data movement will happen if front_free_capacity() is available.
Another strategy is to treat both ends independently: if front insertion 
has no free space, it could always reallocate and make room at front, so 
further push_fronts don’t provoke data movement. This would be similar 
to batch_deque, where memory is allocated if there is no free front 
capacityregardless of the free back capacity  (no data movement),f and 
free segment is recycled from back capacity to insert elements at front. 
Obviously, this approach consumes more memory. Yet another strategy 
would be distribute any new free capacity (after reallocation) between 
front and back, indepently from which operation (back or front) has 
provoked the allocation.
This is tricky. I think the best is to treat both ends independently.
If you start moving things around, you can break the O(1) amortized 
guarantee by interleaving push_front/push_back.
...
2) The data structure o batch_deque is slightly bigger than the 
traditional one, just asking if it’s intended or it was developed like 
this to reuse code. Traditional deque (from SGI) used four elements (map 
pointer, map size, iterator to begin, iterator to end). Your 
implementation defines the map as a devector, which stores size and 
capacity for the map, but this allows you  handle the excess capacity of 
the map, I’m not sure if it adds performance benefits.
Reuse & simplicity would be the answer. If the iterators are fat, I bet 
you get the same size as as a batch_deque.

Having a secondary ad hoc data structure with amortized O(1) double 
ended growth doesn't exactly make the code easier to reason about.

kind regards

Thorsten

Re: [boost] [review] The review of Boost.DoubleEnded starts today: September 21 - September 30

Thorsten Ottosen