Re: [boost] [ boost ] [ Trie ]

6 Mar 2015

      On 5 Mar 2015 at 13:49, Kenneth Adam Miller wrote:
...
Well, my requirements are that I store range-offset encodings and be able
to access them for any reason as fast as possible. So, what I was thinking
about doing was storing a range-value encoding as a tree, but it could be
any kind of self balancing tree. The thing with red-black trees is, I know
how to make them wait-free by reading and implementing a whitepaper that
UTDallas published.
The rebalancing generates a lot of cache line invalidation traffic, 
and ruins concurrent modify performance. Academic theory often misses 
the reality.
...
Does your nedtree cover the ability to store range value encodings? Because
my thinking is, with a trie, you can get the first 3 bytes of an integer
without giving up to much space (it requires only slightly less than 32
megabytes contiguously stored). Each entry in that 32 megabytes can be a
pointer to a red-black tree node, wherein when you do an insertion, you
revert back to the wait-free redblack tree implementation. So, you can
still do range-value encodings, but it will also be far, far faster than if
you didn't having this caching to accelerate it.
nedtries provide a size_t key index algorithms. You can build the 
rest from that.

Bitwise tries are hugely superior to red black trees for close and 
exact find lookups, but inferior for closest find lookups. You can 
see some graphs at 
http://www.nedprod.com/programs/portable/nedtries/.
...
I estimate that it requires at most about 11 instructions in the absolute
worst case scenario to do an insertion, and typical 7 instruction on the
usual "cache-miss"., and amortized 2 instruction for contiguous memory
insertions. It is very highly scalable, and can be managed in flexibly with
various concurrency runtime patterns.
I think you may need to scale your estimates by about 10x. A cache 
line miss costs about 250 CPU cycles alone, and any kind of tree 
algorithm spends most of its time waiting on cache lines.

This is why bitwise tries are much faster than red black trees on out 
of order CPUs: they execute out of order much better. On an in order 
core like Intel Atom bitwise tries are extremely slow, and they're 
not great on even a Cortex A15 ARM either.

nedtries costs about 100 CPU cycles per insert/find/remove. That is 
exceptionally well balanced, most algorithms are much slower at say 
insert than finds (e.g. hash tables). If you do equally do finding, 
inserting and removing, and your item count is less than 10,000, 
bitwise tries are unbeatable.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/

Re: [boost] [ boost ] [ Trie ]

Niall Douglas