Re: [Boost-users] Re: Boost.Python - implementing slicing on extension container class

16 Sep 2003

      At 16:11 16/09/2003 -0400, you wrote:
...
Steve Horne <steve@ninereeds.fsnet.co.uk> writes:
...
The main motivations in writing the C++ library were basically that
the standard library containers are too fragile, lack useful
functionality that cannot reasonably be retrofitted, and probably
(due to the red-black trees assumption of a constant-time memory
access model which is decades out of date and ignores the everyday
facts of caching and virtual memory) unnecessarily slow.
Fascinating.  Submitting these to Boost, perchance?
That sounds like an uphill struggle to me. My choosing to opt out of using 
the STL containers myself is one thing, but submitting an incompatible 
alternative to the standard library containers for widespread use is 
something very different.
...
...
Using my containers, object lifetime and validity is already managed
even in C++, so I really don't need to worry about it in Python.
I guess I'll have to trust you on that one.  It's hard to see how you
can accomplish your goals of optimizing speed and safety at the same
time.
Of course there are trade-offs. I gain speed in some circumstances by my 
choice of data structure. I lose speed due to the management and in order 
to maintain auxiliary data within that structure which is used to provide 
the extra features. And of course in some circumstances the standard 
containers will be faster even if that management and auxiliary data is 
disregarded.

There's also nothing revolutionary about the data structure I'm using - it 
is essentially a very old and heavily tried-and-tested data structure. It's 
just a variation of the age-old multiway trees, but stored in main memory 
instead of on disk. In an age where caching and virtual memory are an 
everyday fact of life on desktop PCs, I figure that main memory accesses 
have a lot in common with disk accesses - though of course this would not 
be true with most embedded systems.

By using a multiway tree rather than a binary tree, you lose the ability to 
restructure the tree with only pointer rotations (you need to physically 
move items on insertions and deletions) but you gain in cache and virtual 
memory friendliness. Reads and simple writes are fast, and on the 
assumption that most data gets read several times after being inserted 
once, that can be a good tradeoff. And unlike an std::vector, the number of 
items that need to be shifted is strictly limited by the size of the node.

I've gone with essentially what I believe are B+ trees (I've found it 
difficult to get good definitions of the difference between B trees and B+ 
trees). That is, data is only held in 'leaf' nodes at the bottom layer of 
the tree. It's also convenient to have prev and next links in the leaf 
nodes for easy iteration, and for each leaf node to have a pointer to a 
singly-linked chain of iterators (meaning that maintenance operations only 
have to deal with the normally small number of iterators that refer into 
one or two leaf nodes).

Add a count of the number of items in the subtree to each branch node and 
subscripted access (while certainly not std::vector-fast) is also 
reasonable. And because these nodes are quite large, the overheads are 
quite small.

The convenience and safety benefits are always there, but speed benefits 
are much more dubious. It's very easy to find cases where the standard 
library containers will be much faster - if the items being held are large 
and complex so that shifting the items within a node is slow, for instance.

Anyway, my code has several problems. Not least (1) it belongs to my 
employer, and (2) it wasn't written to fit in with existing C++ libraries. 
I'd love to think that the idea behind them might be adopted more widely, 
but I suspect it's more of a niche thing - and maybe that niche is just me.