Re: [boost] Determining interest in container with efficient random access insert and erase

12 Sep 2015

      On Sat, Sep 12, 2015 at 8:46 PM, Ion Gaztañaga <igaztanaga@gmail.com> wrote:
...
On 12/09/2015 1:10, Glen Fernandes wrote:
...
Useful links:
...
- Source code: https://github.com/det/segmented_tree_seq
  - Documentation: http://det.github.io/segmented_tree_seq/
  - Benchmarks:
https://github.com/det/segmented_tree_seq/tree/benchmarks
Any feedback is greatly appreciated.
The benchmarks are impressive. I hope to take a deeper look this
weekend, but from my cursory glance: the implementation looks very
clean too.
Ion: Is this something you would be interested in reviewing?
I'd like to separate the proposal to put that data structure into Boost
and the proposal to put it into Boost.Container.
The implementation looks good although I have no experience with B+Trees
or augmented data structures. I think we should have more innovative
container in Boost and there have been more proposals to similar data
structures that were not.
Regarding Boost.Container, one of the biggest problems with adding new
stuff into Boost.Container is that it should be consistent with the rest of
the library. The library supports C++03, supports Interprocess
requirements, recursive types, SCARY iterators, etc... it must be
maintained and documented as the rest of the library. I wouldn't like to
maintain data structures I really don't fully understand or just left them
unmaintained. We added stable_vector and static_vector but those were
easier data structures.
Just like we have Boost.CircularBuffer or Boost.Unordered I'd like to
propose adding adding new containers in its own library. Maybe an
(optionally augmented) B+Tree library would be nice, and it could also
contain a sequence container based on that data structure.
Just my 2 cents, does it make sense?
All of your considerations make sense. I understand technical problems that
can be created by adding other (potentially quite different) types of
containers to Boost libraries. But I am sure that there will be new
proposals and new requests for advanced data structures. Sequences are
everywhere in real life applications, not only texts and strings. In
addition to this, there is an interesting lesson that I have learned since
written "STL extensions" library.

These days big interest and selling point for augmented data structures is
not so much single augmenting by count of container elements, but multiple
augmenting, which supports fast summation algorithms (more generally fast
computations using binary operations). These data structure can be used to
compute important statistical parameters, such as mean value and standard
deviation, in logarithmic running time on cheap single processor systems.
The algorithms are general and can be applied to both ordered and unordered
data sets. Alternative parallel computations require supercomputers to
achieve the same performance.

The application area - big data and data science.

Regards,
Vadim Stadnik