Seeking feedback on SegmentedTreeSeq (sequence data structure with efficient random access insert and erase)
Greetings, At this time I would like to solicit feedback for my library SegmentedTreeSeq. It is an implementation of a counted B+Tree that allows efficient random access insert and erase. It's an ideal data structure for an application such as a text editor. I am also seeking interested reviewers as I feel it has reached a state that is ready for formal review. Features: - Implements the union of std::vector, std::deque, and std::list interfaces. - Implements the C++11 allocator model. - Implements the full C++14 interfaces including emplacement and move semantics. - O(log n) to obtain an iterator at any position. - O(log n) insert and erase. - Amortized O(log n) iterator movement. - Strong exception guarantee for insert. - Support for incomplete types. - SCARY iterators. - Best in class performance for many operations. - Low memory overhead for data of any type. Links: - Incubator: http://rrsd.com/blincubator.com/bi_library/segmentedtreeseq-2/?gform_post_id... - Repository: https://github.com/det/segmented_tree_seq - Documentation: https://det.github.io/segmented_tree_seq Thanks, Chris
Chris Clearwater wrote:
At this time I would like to solicit feedback for my library SegmentedTreeSeq.
I refer to the feedback that I gave last time you posted: http://thread.gmane.org/gmane.comp.lib.boost.devel/263167 Regards, Phil.
On 11/09/2015 10:14 AM, Phil Endecott wrote:
I refer to the feedback that I gave last time you posted:
Thanks for your feedback Phil. Sorry I did not properly respond in the original thread.
I think this could be most useful if it were decomposed into several layers:
- An in-memory B+-Tree - An augmented tree - An order-statistics tree
That way, we could have in-memory B+-trees the provide a std::map-like interface, which would surely be useful, and we could have O(log N) random access insert/erase on a binary tree where that is appropriate, and we could have other kinds of augmented trees such as interval trees.
Working out exactly what these layers should look like is not easy, but I think it's worth doing in order to broaden the applicability.
My library implements a counted B+Tree. I see this as the basis to which other indexes can be added. In particular I would like to add support for maps and sets as this would provide a high performance alternative to the standard containers for trivially-copyable keys. I see no reason to make the count index optional. The memory overhead of storing the count index for a set where the key is 64 bits is less than a tenth of a percent. This is because the counts stored in level 1 of the tree are used to encode the length of segments, which maps/sets need to know as well as sequences. I have also benchmarked the CPU overhead of maintaining the count index and found it to be negligible. On top of this, it would dramtically complicate implementation for the count index to be optional as I would still need to store the segment lengths somewhere and implement iterator operations less efficiently when it was not present. Keeping the index around would also mean that maps/sets would recieve efficient access by position for practically free. That being said, I think the data structure is useful today and should be a part of Boost. Pouring over the mailing list I see that other people have proposed and even implemented sequences with logarithmic random access operations many times, so clearly there is a need for such a thing in the community. The reaon I am posting again to the list is because I have worked on reference documentation, boost documentation, testing, and polished the implementation. This addresses some of the comments from my last post so it's worth taking another look at. Regards, Chris.
participants (2)
-
Chris Clearwater
-
Phil Endecott