Giacomo Drago wrote:
On 2015-03-22 14:41, Phil Endecott wrote:
There is an in-memory B-tree at google code that looks decent, though I've not personally used it:
http://code.google.com/p/cpp-btree/
If you click "Wiki pages - UsageInstructions" at the bottom left of that page you'll find tables giving the memory overhead. A B-tree has no per-item pointers, but it does have per-page pointers; you can asymptotically approach zero overhead by increasing the page size. With an infinite page size you effectively have a flat_map, with a page size of one you have a binary tree.
In practice, a more important influence on memory footprint than the per-page pointers is the empty space in the pages that results from splitting after insertion, and after deletion but before merging. But a flat_map implemented on a std::vector has a similar issue, i.e. that std::vector doubles its size when it re-allocates. So a std::vector will be wasting up to half of the memory that it has allocated.
On a vector you can easily shrink_to_fit thus bringing the overhead to zero (if you neglect the vector size and a pointer to its buffer).
And you can compact ("vacuum") a B-tree, if you want.
I still think a flat_map (that does not mean this flat_map in particular) has some reason to be.
For me, the main uses of flat data structures have been: (a) When it's important to have no pointers at all, so that I can store the data in a memory-mapped file or perhaps in interprocess shared memory. (Though most recently I decided to use offset pointers from Boost.Interprocess along with a Boost.Intrusive map, and that worked well.) (b) When the data has a "life cycle" with different phases, e.g. an initial phase of bulk creation, followed by a phase of random lookup. In this case you only sort once so the computational complexity is optimal. I would not want to use them when either you have fine-grained mixing of insertion and lookup, or when the amount of data is large enough that the poor locality of reference starts to dominate. Regards, Phil.