Re: [boost] boost] Questions about incorporating my library into a new library within Boost

26 Jul 2022

      ...
What is not stored contiguously is child, this is stored in
std::vector<std::unique_ptr>.
But when N keys are erased, the number of children erased is not O(N),
because a node can have #(fanout) keys.
I have to correct this sentence, the number of children erased is O(N).
But the coefficient is small, because practically B-Trees use large fanouts.

Most technically, a call for erase_range(a, b) would be O(log N) + O(K)
where N is the number of whole keys and K is the number of keys being
erased.
I've updated the repo description, thanks for the catch.

Best,
Jooseong

2022년 7월 27일 (수) 오전 12:06, Jooseong Lee <frozenca91@gmail.com>님이 작성:
...
...
If the element type has a trivial destructor and they are stored
contiguously,
then you can try to argue that it is better than O(N). This will depend
on
the complexity guarantees of your memory allocator. But your elements
are not
stored contiguously; destroying N elements will require O(N) "pages" to
be
released, if your pages are fixed-size. So it is still O(N).
Actually, the keys are stored contiguously:
https://github.com/frozenca/BTree/blob/bc62015a3789e5004ec98be0b82587de4edfd...
For trivially copyable types, the type of key array is std::array;
for other types, the type of key array is std::vector. (in this case, your
argument would be right)
What is not stored contiguously is child, this is stored in
std::vector<std::unique_ptr>.
But when N keys are erased, the number of children erased is not O(N),
because a node can have #(fanout) keys.
Best,
Jooseong
2022년 7월 26일 (화) 오후 11:40, Phil Endecott via Boost <boost@lists.boost.org>님이
작성:
...
...
...
...
I don't see how erase(a,b) can be O(log N), is that a typo?
No, it's not.
In order to explain this, I have to explain split() first:
split(Tree&& tree, key a, key b) splits a tree to two trees, say tree_l
and tree_r, so that
tree_l contains all keys less than a, tree_r contains all keys greater
than b.
By B-Tree's nature, this is O(log N).
erase(a, b) works like this:
1. compute lb = lower_bound(a), ub = upper_bound(b) pair. These are
iterators. O(log N)
2. compute order(lb), order(ub). Therefore, order(ub) - order(lb) is
...
...
number of elements that will be erased. O(log N)
3-1. When order(ub) - order(lb) < 30, the tree erases element one by
one
3-2. When it exceeds 30, the algorithm first splits the tree to two
...
...
by calling spit(std::move(*this), lb, ub)
and joins the two split trees. split() and join() is O(log N), and
Jooseong Lee wrote:
the
trees
there
...
...
is one split() call and join() call,
so the whole complexity is O(log N)
You have to call the destructors on the elements that you have erased,
so it must be O(N) in general. Your description skips the part where you
drop the portion of the tree between the two splits.
If the element type has a trivial destructor and they are stored
contiguously,
then you can try to argue that it is better than O(N). This will depend on
the complexity guarantees of your memory allocator. But your elements
are not
stored contiguously; destroying N elements will require O(N) "pages" to be
released, if your pages are fixed-size. So it is still O(N).
Regards, Phil.
_______________________________________________
Unsubscribe & other changes:
http://lists.boost.org/mailman/listinfo.cgi/boost