Re: [boost] boost] [flat_map] Any interest in a flat_map variant that is much faster on insertion/erase at the price of a slower lookup?

22 Mar 2015

      Giacomo Drago wrote:
...
On 2015-03-22 14:41, Phil Endecott wrote:
...
There is an in-memory B-tree at google code that looks decent, though
I've not personally used it:
http://code.google.com/p/cpp-btree/
If you click "Wiki pages - UsageInstructions" at the bottom left of
that page you'll find tables giving the memory overhead.  A B-tree has
no per-item pointers, but it does have per-page pointers; you can
asymptotically approach zero overhead by increasing the page size.
With an infinite page size you effectively have a flat_map, with
a page size of one you have a binary tree.
In practice, a more important influence on memory footprint than the
per-page pointers is the empty space in the pages that results from
splitting after insertion, and after deletion but before merging.
But a flat_map implemented on a std::vector has a similar issue, i.e.
that std::vector doubles its size when it re-allocates.  So a
std::vector will be wasting up to half of the memory that it has
allocated.
On a vector you can easily shrink_to_fit thus bringing the overhead to 
zero (if you neglect the vector size and a pointer to its buffer).
And you can compact ("vacuum") a B-tree, if you want.
...
I still think a flat_map (that does not mean this 
flat_map in particular) has some reason to be.
For me, the main uses of flat data structures have been:

(a) When it's important to have no pointers at all, so that I can store
the data in a memory-mapped file or perhaps in interprocess shared
memory.  (Though most recently I decided to use offset pointers from
Boost.Interprocess along with a Boost.Intrusive map, and that worked
well.)

(b) When the data has a "life cycle" with different phases, e.g. an
initial phase of bulk creation, followed by a phase of random lookup.
In this case you only sort once so the computational complexity is
optimal.

I would not want to use them when either you have fine-grained mixing
of insertion and lookup, or when the amount of data is large enough
that the poor locality of reference starts to dominate.

Regards,  Phil.