At 16:11 16/09/2003 -0400, you wrote:
Steve Horne
writes: The main motivations in writing the C++ library were basically that the standard library containers are too fragile, lack useful functionality that cannot reasonably be retrofitted, and probably (due to the red-black trees assumption of a constant-time memory access model which is decades out of date and ignores the everyday facts of caching and virtual memory) unnecessarily slow.
Fascinating. Submitting these to Boost, perchance?
That sounds like an uphill struggle to me. My choosing to opt out of using the STL containers myself is one thing, but submitting an incompatible alternative to the standard library containers for widespread use is something very different.
Using my containers, object lifetime and validity is already managed even in C++, so I really don't need to worry about it in Python.
I guess I'll have to trust you on that one. It's hard to see how you can accomplish your goals of optimizing speed and safety at the same time.
Of course there are trade-offs. I gain speed in some circumstances by my choice of data structure. I lose speed due to the management and in order to maintain auxiliary data within that structure which is used to provide the extra features. And of course in some circumstances the standard containers will be faster even if that management and auxiliary data is disregarded. There's also nothing revolutionary about the data structure I'm using - it is essentially a very old and heavily tried-and-tested data structure. It's just a variation of the age-old multiway trees, but stored in main memory instead of on disk. In an age where caching and virtual memory are an everyday fact of life on desktop PCs, I figure that main memory accesses have a lot in common with disk accesses - though of course this would not be true with most embedded systems. By using a multiway tree rather than a binary tree, you lose the ability to restructure the tree with only pointer rotations (you need to physically move items on insertions and deletions) but you gain in cache and virtual memory friendliness. Reads and simple writes are fast, and on the assumption that most data gets read several times after being inserted once, that can be a good tradeoff. And unlike an std::vector, the number of items that need to be shifted is strictly limited by the size of the node. I've gone with essentially what I believe are B+ trees (I've found it difficult to get good definitions of the difference between B trees and B+ trees). That is, data is only held in 'leaf' nodes at the bottom layer of the tree. It's also convenient to have prev and next links in the leaf nodes for easy iteration, and for each leaf node to have a pointer to a singly-linked chain of iterators (meaning that maintenance operations only have to deal with the normally small number of iterators that refer into one or two leaf nodes). Add a count of the number of items in the subtree to each branch node and subscripted access (while certainly not std::vector-fast) is also reasonable. And because these nodes are quite large, the overheads are quite small. The convenience and safety benefits are always there, but speed benefits are much more dubious. It's very easy to find cases where the standard library containers will be much faster - if the items being held are large and complex so that shifting the items within a node is slow, for instance. Anyway, my code has several problems. Not least (1) it belongs to my employer, and (2) it wasn't written to fit in with existing C++ libraries. I'd love to think that the idea behind them might be adopted more widely, but I suspect it's more of a niche thing - and maybe that niche is just me.