I see that you have O(log N) n-th element access, that it is an interesting feature. Does this mean that every non-leaf node has a count of its descendants?
Yes, every non-leaf (and leaf) node has a count of keys in the subtree where that node is the root.
I don't see how erase(a,b) can be O(log N), is that a typo?
No, it's not.
In order to explain this, I have to explain split() first:
split(Tree&& tree, key a, key b) splits a tree to two trees, say tree_l and
tree_r, so that
tree_l contains all keys less than a, tree_r contains all keys greater than
b.
By B-Tree's nature, this is O(log N).
erase(a, b) works like this:
1. compute lb = lower_bound(a), ub = upper_bound(b) pair. These are
iterators. O(log N)
2. compute order(lb), order(ub). Therefore, order(ub) - order(lb) is the
number of elements that will be erased. O(log N)
3-1. When order(ub) - order(lb) < 30, the tree erases element one by one
3-2. When it exceeds 30, the algorithm first splits the tree to two trees
by calling spit(std::move(*this), lb, ub)
and joins the two split trees. split() and join() is O(log N), and there is
one split() call and join() call,
so the whole complexity is O(log N)
I plan to benchmark mine with abseil's B-Tree soon.
Thank you for pointing out Beman's implementation and sqlite, I'll list
them too.
I'll add Boost license too, thanks for that
Best,
Jooseong
2022년 7월 25일 (월) 오후 11:29, Phil Endecott via Boost
Jooseong Lee wrote:
I recently wrote a general-purpose STL-like ordered associative B-Tree based container in C++. . Its API and usage is very similar to std::set, std::map, std::multiset, std::multimap, and when the number of elements is relatively large, my benchmark shows that it is 2.5~3 times faster than std::set and its friends.
Repo link: https://github.com/frozenca/BTree
I would be interested to see a comparison with Beman Dawes' B-Tree library from 2010:
https://lists.boost.org/Archives/boost//2010/09/170995.php https://github.com/Beman/btree
Personally, I often use flat sets/maps. Of course these are best suited when lookups dominate over modifications. It would be interesting to know how your library compares for lookups.
I see that you have O(log N) n-th element access, that it is an interesting feature. Does this mean that every non-leaf node has a count of its descendants?
I don't see how erase(a,b) can be O(log N), is that a typo?
In the past I have used a lot of memory-mapped, mostly read-only, binary data structures but these days I'm more likely to use sqlite. Have you tried to benchmark against that?
I note it is not Boost licensed.
Regards, Phil.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost