Re: [boost] Questions about incorporating my library into a new library within Boost

26 Jul 2022

      Dear Phil, FYI,
...
I would be interested to see a comparison with Beman Dawes'
B-Tree library from 2010:
https://github.com/Beman/btree seems to depend on an older version of
Boost.Endian,
It doesn't compile with the recent versions of boost. I can't find a proper
version of Boost.Endian which is compatible with.

To compare with Google's 2011 B-Tree implementation, (
https://code.google.com/archive/p/cpp-btree/)
I saw the following results:
https://github.com/frozenca/BTree/tree/295a9ec4b5495a83c546fd3e300d54d4c78d8...

Regards,
Jooseong

2022년 7월 26일 (화) 오전 12:03, Jooseong Lee <frozenca91@gmail.com>님이 작성:
...
...
I see that you have O(log N) n-th element access, that it is an
interesting feature. Does this mean that every non-leaf node has a
count of its descendants?
Yes, every non-leaf (and leaf) node has a count of keys in the subtree
where that node is the root.
...
I don't see how erase(a,b) can be O(log N), is that a typo?
No, it's not.
In order to explain this, I have to explain split() first:
split(Tree&& tree, key a, key b) splits a tree to two trees, say tree_l
and tree_r, so that
tree_l contains all keys less than a, tree_r contains all keys greater
than b.
By B-Tree's nature, this is O(log N).
erase(a, b) works like this:
1. compute lb = lower_bound(a), ub = upper_bound(b) pair. These are
iterators. O(log N)
2. compute order(lb), order(ub). Therefore, order(ub) - order(lb) is the
number of elements that will be erased. O(log N)
3-1. When order(ub) - order(lb) < 30, the tree erases element one by one
3-2. When it exceeds 30, the algorithm first splits the tree to two trees
by calling spit(std::move(*this), lb, ub)
and joins the two split trees. split() and join() is O(log N), and there
is one split() call and join() call,
so the whole complexity is O(log N)
I plan to benchmark mine with abseil's B-Tree soon.
Thank you for pointing out Beman's implementation and sqlite, I'll list
them too.
I'll add Boost license too, thanks for that
Best,
Jooseong
2022년 7월 25일 (월) 오후 11:29, Phil Endecott via Boost <boost@lists.boost.org>님이
작성:
...
Jooseong Lee wrote:
...
I recently wrote a general-purpose STL-like ordered associative B-Tree
based container in C++.
.
Its API and usage is very similar to std::set, std::map, std::multiset,
std::multimap,
and when the number of elements is relatively large, my benchmark shows
that it is 2.5~3 times faster than std::set and its friends.
Repo link: https://github.com/frozenca/BTree
I would be interested to see a comparison with Beman Dawes'
B-Tree library from 2010:
https://lists.boost.org/Archives/boost//2010/09/170995.php
https://github.com/Beman/btree
Personally, I often use flat sets/maps. Of course these are best
suited when lookups dominate over modifications. It would be interesting
to know how your library compares for lookups.
I see that you have O(log N) n-th element access, that it is an
interesting feature. Does this mean that every non-leaf node has a
count of its descendants?
I don't see how erase(a,b) can be O(log N), is that a typo?
In the past I have used a lot of memory-mapped, mostly read-only,
binary data structures but these days I'm more likely to use
sqlite. Have you tried to benchmark against that?
I note it is not Boost licensed.
Regards, Phil.
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost