On 5 Mar 2015 at 13:49, Kenneth Adam Miller wrote:
Well, my requirements are that I store range-offset encodings and be able to access them for any reason as fast as possible. So, what I was thinking about doing was storing a range-value encoding as a tree, but it could be any kind of self balancing tree. The thing with red-black trees is, I know how to make them wait-free by reading and implementing a whitepaper that UTDallas published.
The rebalancing generates a lot of cache line invalidation traffic, and ruins concurrent modify performance. Academic theory often misses the reality.
Does your nedtree cover the ability to store range value encodings? Because my thinking is, with a trie, you can get the first 3 bytes of an integer without giving up to much space (it requires only slightly less than 32 megabytes contiguously stored). Each entry in that 32 megabytes can be a pointer to a red-black tree node, wherein when you do an insertion, you revert back to the wait-free redblack tree implementation. So, you can still do range-value encodings, but it will also be far, far faster than if you didn't having this caching to accelerate it.
nedtries provide a size_t key index algorithms. You can build the rest from that. Bitwise tries are hugely superior to red black trees for close and exact find lookups, but inferior for closest find lookups. You can see some graphs at http://www.nedprod.com/programs/portable/nedtries/.
I estimate that it requires at most about 11 instructions in the absolute worst case scenario to do an insertion, and typical 7 instruction on the usual "cache-miss"., and amortized 2 instruction for contiguous memory insertions. It is very highly scalable, and can be managed in flexibly with various concurrency runtime patterns.
I think you may need to scale your estimates by about 10x. A cache line miss costs about 250 CPU cycles alone, and any kind of tree algorithm spends most of its time waiting on cache lines. This is why bitwise tries are much faster than red black trees on out of order CPUs: they execute out of order much better. On an in order core like Intel Atom bitwise tries are extremely slow, and they're not great on even a Cortex A15 ARM either. nedtries costs about 100 CPU cycles per insert/find/remove. That is exceptionally well balanced, most algorithms are much slower at say insert than finds (e.g. hash tables). If you do equally do finding, inserting and removing, and your item count is less than 10,000, bitwise tries are unbeatable. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/