Determining interest in container with efficient random access insert and erase

Chris Clearwater

11 Sep 2015 11 Sep '15

10:10 a.m.

Greetings, I have written a library with the intention of submitting it to Boost.Container. My container implements O(log n) random access insert/erase and has a strong focus on performance while supporting the interfaces of the standard library sequence types. An example application would be the character buffer of a text editor. Useful links: - Source code: https://github.com/det/segmented_tree_seq - Documentation: http://det.github.io/segmented_tree_seq/ - Benchmarks: https://github.com/det/segmented_tree_seq/tree/benchmarks Any feedback is greatly appreciated.

Show replies by date

Andrey Semashev

11 Sep 11 Sep

3 p.m.

On 11.09.2015 13:10, Chris Clearwater wrote:

...

Greetings,

I have written a library with the intention of submitting it to Boost.Container. My container implements O(log n) random access insert/erase and has a strong focus on performance while supporting the interfaces of the standard library sequence types. An example application would be the character buffer of a text editor.

Useful links: - Source code: https://github.com/det/segmented_tree_seq - Documentation: http://det.github.io/segmented_tree_seq/ - Benchmarks: https://github.com/det/segmented_tree_seq/tree/benchmarks

Any feedback is greatly appreciated.

Could you provide more details on your container special features and advantages/weaknesses? E.g. how is it better than, for instance, Boost.MultiIndex with random access and binary lookup indices? O(log n) random access is an unusual trait; random access containers are usually able to provide O(1) complexity for accessing an element by index. It seems there is a key-value lookup instead of a true random access. Am I correct?

Chris Clearwater

4:17 p.m.

On 09/11/2015 08:00 AM, Andrey Semashev wrote:

...

Could you provide more details on your container special features and advantages/weaknesses? E.g. how is it better than, for instance, Boost.MultiIndex with random access and binary lookup indices?

My segmented_tree_seq container provides the same interface as std::vector/std::deque (excepting reserve and shrink_to_fit) while Boost.MultiIndex has an interface of its own. Also, given that Boost.MultiIndex provides stable iterators, I'd expect it allocates each element individually. This makes it completely unsuitable for many applications such as storing a buffer of 8bit characters. It probably also means that it's performance would be similar to avl_array which I outperform by as much as 10x in many benchmarks. My container does not provide stable iterators and as a consequence is able to be implemented in a very cache efficient way. Much like you should prefer vector when you are doing insert/erase near the end, and you should prefer deque when you are doing insert/erase near both ends, you should prefer my container when you are doing insert/erase at any index.

...

O(log n) random access is an unusual trait; random access containers are usually able to provide O(1) complexity for accessing an element by index. It seems there is a key-value lookup instead of a true random access. Am I correct?

As the name suggest, segmented_tree_seq is a tree. It maintains an B+Tree index into chunks of data similar to how deque maintains a vector index into chunks of data.

Christian N.

3:02 p.m.

On 2015-09-11 12:10 +0200, Chris Clearwater wrote:

...

Greetings,

I have written a library with the intention of submitting it to Boost.Container. My container implements O(log n) random access insert/erase and has a strong focus on performance while supporting the interfaces of the standard library sequence types. An example application would be the character buffer of a text editor.

Useful links: - Source code: https://github.com/det/segmented_tree_seq - Documentation: http://det.github.io/segmented_tree_seq/ - Benchmarks: https://github.com/det/segmented_tree_seq/tree/benchmarks

Any feedback is greatly appreciated.

Since I haven't found documentation on how this is implemented I have to ask: Is this "segmented tree" a rope[1, 2]? Usabele for text editor and log(n) inserts and erase seem to fit that guess :-). [1]: https://en.wikipedia.org/wiki/Rope_%28data_structure%29 [2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.9450&rep=rep1&type=pdf Regards, Christian

Chris Clearwater

11:10 p.m.

On 09/11/2015 08:02 AM, Christian N. wrote:

...

Since I haven't found documentation on how this is implemented I have to ask: Is this "segmented tree" a rope[1, 2]? Usabele for text editor and log(n) inserts and erase seem to fit that guess :-).

[1]: https://en.wikipedia.org/wiki/Rope_%28data_structure%29 [2]: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.14.9450&rep=rep1&type=pdf

My container is implemented using a counted B+Tree, it is not a persistent data structure like a Rope. It is intended for the fastest possible destructive random access updates.

Rene Rivera

3:16 p.m.

On Fri, Sep 11, 2015 at 5:10 AM, Chris Clearwater <chris@detrino.org> wrote:

...

Greetings,

I have written a library with the intention of submitting it to Boost.Container. My container implements O(log n) random access insert/erase and has a strong focus on performance while supporting the interfaces of the standard library sequence types. An example application would be the character buffer of a text editor.

Useful links: - Source code: https://github.com/det/segmented_tree_seq - Documentation: http://det.github.io/segmented_tree_seq/ - Benchmarks: https://github.com/det/segmented_tree_seq/tree/benchmarks

Any feedback is greatly appreciated.

First.. You are going to need to add some documentation as to how your container works with regards to complexity order. Second.. Your implementation is rather hard to follow (but this is not a necessarily a bad thing). I say that because I was trying to figure out how your container attains the O9log n) you claim. And AFAICT, because of this < https://github.com/det/segmented_tree_seq/blob/master/include/boost/container/segmented_tree_seq.hpp#L1278>, that your container is just another implementation of an order statistic tree <https://en.wikipedia.org/wiki/Order_statistic_tree>. Which is a data structure that has been proposed for Boost (and standardization) at various times. -- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net -- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail

Andy Thomason

4:19 p.m.

With my commercial background I am very septical about O(lg(N)) search algorithms as they do not work well on modern hardware unless you are very careful with the data layout. A vector will linear search faster than a tree with up to a few hundred elements. Easy to prove and search for Stroustrup's opininion on complex data structures. A middle-first tree is often better (I don't now if there is an academic equivlent). Store the middle element first in a vector and the two quartiles second and third etc. The first few and last few accesses are in the same cache lines. I see a lot of pointers and allocated data structures in this as well as about eight levels of indirection of push_back which will be fun in debug mode :) I don't want to piss on the bonfire... Andy. On 11/09/2015 16:16, Rene Rivera wrote:

...

On Fri, Sep 11, 2015 at 5:10 AM, Chris Clearwater <chris@detrino.org> wrote:

...
Greetings,

I have written a library with the intention of submitting it to Boost.Container. My container implements O(log n) random access insert/erase and has a strong focus on performance while supporting the interfaces of the standard library sequence types. An example application would be the character buffer of a text editor.

Useful links: - Source code: https://github.com/det/segmented_tree_seq - Documentation: http://det.github.io/segmented_tree_seq/ - Benchmarks: https://github.com/det/segmented_tree_seq/tree/benchmarks

Any feedback is greatly appreciated.

First.. You are going to need to add some documentation as to how your container works with regards to complexity order. Second.. Your implementation is rather hard to follow (but this is not a necessarily a bad thing). I say that because I was trying to figure out how your container attains the O9log n) you claim. And AFAICT, because of this < https://github.com/det/segmented_tree_seq/blob/master/include/boost/container/segmented_tree_seq.hpp#L1278>, that your container is just another implementation of an order statistic tree <https://en.wikipedia.org/wiki/Order_statistic_tree>. Which is a data structure that has been proposed for Boost (and standardization) at various times.

--- This email has been checked for viruses by Avast antivirus software. http://www.avast.com

Chris Clearwater

5:17 p.m.

On 09/11/2015 09:19 AM, Andy Thomason wrote:

...

With my commercial background I am very septical about O(lg(N)) search algorithms as they do not work well on modern hardware unless you are very careful with the data layout.

I am very careful with the data layout, my container is implemented using a counted B+Tree which is cache efficient.

...

A vector will linear search faster than a tree with up to a few hundred elements. Easy to prove and search for Stroustrup's opininion on complex data structures.

A B+Tree has the same layout as a vector until a certain size. Your argumens seem to be against binary trees.

...

A middle-first tree is often better (I don't now if there is an academic equivlent). Store the middle element first in a vector and the two quartiles second and third etc.

The first few and last few accesses are in the same cache lines.

This is very similar to the layout of a B+Tree.

...

I see a lot of pointers and allocated data structures in this as well as about eight levels of indirection of push_back which will be fun in debug mode :) I don't want to piss on the bonfire...

Andy.

Chris Clearwater

4:39 p.m.

On 09/11/2015 08:16 AM, Rene Rivera wrote:

...

First.. You are going to need to add some documentation as to how your container works with regards to complexity order. Second.. Your implementation is rather hard to follow (but this is not a necessarily a bad thing). I say that because I was trying to figure out how your container attains the O9log n) you claim. And AFAICT, because of this < https://github.com/det/segmented_tree_seq/blob/master/include/boost/container/segmented_tree_seq.hpp#L1278>, that your container is just another implementation of an order statistic tree <https://en.wikipedia.org/wiki/Order_statistic_tree>. Which is a data structure that has been proposed for Boost (and standardization) at various times.

The complexity of all operations in my container are already documented in the link I provided. I've most often heard this data structure refered to as a counted tree, my implementation in particular is a counted B+Tree, the elements it contains are not ordered.

Joaquín M López Munoz

5:25 p.m.

New subject: Determining interest in container with efficient random access insert and erase

Rene Rivera <grafikrobot <at> gmail.com> writes:

...

I say that because I was trying to figure out how your container attains the O9log n) you claim. And AFAICT, because of this < https://github.com/det/segmented_tree_seq/ blob/master/include/boost/container/segmented_tree_seq.hpp#L1278>, that your container is just another implementation of an order statistic tree <https://en.wikipedia.org/wiki/Order_statistic_tree>. Which is a data structure that has been proposed for Boost (and standardization) at various times.

FWIW, as of Boost 1.59 Boost.MultiIndex sports ranked indices: http://www.boost.org/doc/libs/1_59_0/libs/multi_index/doc/tutorial/indices.h... whose inplementation is basically an order statistic tree. Joaquín M López Muñoz Telefónica

Glen Fernandes

11:10 p.m.

On Fri, Sep 11, 2015 at 6:10 AM, Chris Clearwater wrote:

...

Greetings,

I have written a library with the intention of submitting it to Boost.Container. My container implements O(log n) random access insert/erase and has a strong focus on performance while supporting the interfaces of the standard library sequence types. An example application would be the character buffer of a text editor.

Useful links: - Source code: https://github.com/det/segmented_tree_seq - Documentation: http://det.github.io/segmented_tree_seq/ - Benchmarks: https://github.com/det/segmented_tree_seq/tree/benchmarks

Any feedback is greatly appreciated.

The benchmarks are impressive. I hope to take a deeper look this weekend, but from my cursory glance: the implementation looks very clean too. Ion: Is this something you would be interested in reviewing? Glen

Ion Gaztañaga

12 Sep 12 Sep

10:46 a.m.

On 12/09/2015 1:10, Glen Fernandes wrote:

...

...
Useful links: - Source code: https://github.com/det/segmented_tree_seq - Documentation: http://det.github.io/segmented_tree_seq/ - Benchmarks: https://github.com/det/segmented_tree_seq/tree/benchmarks

Any feedback is greatly appreciated.

The benchmarks are impressive. I hope to take a deeper look this weekend, but from my cursory glance: the implementation looks very clean too.

Ion: Is this something you would be interested in reviewing?

I'd like to separate the proposal to put that data structure into Boost and the proposal to put it into Boost.Container. The implementation looks good although I have no experience with B+Trees or augmented data structures. I think we should have more innovative container in Boost and there have been more proposals to similar data structures that were not. Regarding Boost.Container, one of the biggest problems with adding new stuff into Boost.Container is that it should be consistent with the rest of the library. The library supports C++03, supports Interprocess requirements, recursive types, SCARY iterators, etc... it must be maintained and documented as the rest of the library. I wouldn't like to maintain data structures I really don't fully understand or just left them unmaintained. We added stable_vector and static_vector but those were easier data structures. Just like we have Boost.CircularBuffer or Boost.Unordered I'd like to propose adding adding new containers in its own library. Maybe an (optionally augmented) B+Tree library would be nice, and it could also contain a sequence container based on that data structure. Just my 2 cents, does it make sense? Ion

Glen Fernandes

11:39 a.m.

On Sat, Sep 12, 2015 at 6:46 AM, Ion Gaztañaga wrote:

...

I'd like to separate the proposal to put that data structure into Boost and the proposal to put it into Boost.Container.

The implementation looks good although I have no experience with B+Trees or augmented data structures. I think we should have more innovative container in Boost and there have been more proposals to similar data structures that were not.

Regarding Boost.Container, one of the biggest problems with adding new stuff into Boost.Container is that it should be consistent with the rest of the library. The library supports C++03, supports Interprocess requirements, recursive types, SCARY iterators, etc... it must be maintained and documented as the rest of the library. I wouldn't like to maintain data structures I really don't fully understand or just left them unmaintained. We added stable_vector and static_vector but those were easier data structures.

Just like we have Boost.CircularBuffer or Boost.Unordered I'd like to propose adding adding new containers in its own library. Maybe an (optionally augmented) B+Tree library would be nice, and it could also contain a sequence container based on that data structure.

Just my 2 cents, does it make sense?

Ion

Absolutely. Chris suggested Boost.Container, but I agree: this can/should be its own Boost library. Glen

Chris Clearwater

12:47 p.m.

On 09/12/2015 03:46 AM, Ion Gaztañaga wrote:

...

I'd like to separate the proposal to put that data structure into Boost and the proposal to put it into Boost.Container.

The implementation looks good although I have no experience with B+Trees or augmented data structures. I think we should have more innovative container in Boost and there have been more proposals to similar data structures that were not.

Regarding Boost.Container, one of the biggest problems with adding new stuff into Boost.Container is that it should be consistent with the rest of the library. The library supports C++03, supports Interprocess requirements, recursive types, SCARY iterators, etc... it must be maintained and documented as the rest of the library. I wouldn't like to maintain data structures I really don't fully understand or just left them unmaintained. We added stable_vector and static_vector but those were easier data structures.

Just like we have Boost.CircularBuffer or Boost.Unordered I'd like to propose adding adding new containers in its own library. Maybe an (optionally augmented) B+Tree library would be nice, and it could also contain a sequence container based on that data structure.

Just my 2 cents, does it make sense?

That makes sense. It may not be the best fit for Boost.Container. FWIW, I beleive I support all your requirements except for C++03 support and SCARY iterators. I think SCARY iterators should not be terribly hard to implement. Am I correct that to be SCARY iterator compatible it is ok to parameterize my iterators by allocator::pointer? It's just a problem if they depend on the allocator itself?

Ion Gaztañaga

3:53 p.m.

On 12/09/2015 14:47, Chris Clearwater wrote:

...

That makes sense. It may not be the best fit for Boost.Container. FWIW, I beleive I support all your requirements except for C++03 support and SCARY iterators. I think SCARY iterators should not be terribly hard to implement. Am I correct that to be SCARY iterator compatible it is ok to parameterize my iterators by allocator::pointer? It's just a problem if they depend on the allocator itself?

Scary iterators are not hard. An iterator that depends only on the pointer type is usually the answer in non-intrusive containers. I don't know if in a B+Tree an iterator needs to know additional information about the data structure to perform the iteration. Best, Ion

Vadim Stadnik

1:39 p.m.

On Sat, Sep 12, 2015 at 8:46 PM, Ion Gaztañaga <igaztanaga@gmail.com> wrote:

...

On 12/09/2015 1:10, Glen Fernandes wrote:

...
Useful links:

...
- Source code: https://github.com/det/segmented_tree_seq - Documentation: http://det.github.io/segmented_tree_seq/ - Benchmarks: https://github.com/det/segmented_tree_seq/tree/benchmarks

Any feedback is greatly appreciated.

The benchmarks are impressive. I hope to take a deeper look this weekend, but from my cursory glance: the implementation looks very clean too.

Ion: Is this something you would be interested in reviewing?

I'd like to separate the proposal to put that data structure into Boost and the proposal to put it into Boost.Container.

The implementation looks good although I have no experience with B+Trees or augmented data structures. I think we should have more innovative container in Boost and there have been more proposals to similar data structures that were not.

Regarding Boost.Container, one of the biggest problems with adding new stuff into Boost.Container is that it should be consistent with the rest of the library. The library supports C++03, supports Interprocess requirements, recursive types, SCARY iterators, etc... it must be maintained and documented as the rest of the library. I wouldn't like to maintain data structures I really don't fully understand or just left them unmaintained. We added stable_vector and static_vector but those were easier data structures.

Just like we have Boost.CircularBuffer or Boost.Unordered I'd like to propose adding adding new containers in its own library. Maybe an (optionally augmented) B+Tree library would be nice, and it could also contain a sequence container based on that data structure.

Just my 2 cents, does it make sense?

All of your considerations make sense. I understand technical problems that can be created by adding other (potentially quite different) types of containers to Boost libraries. But I am sure that there will be new proposals and new requests for advanced data structures. Sequences are everywhere in real life applications, not only texts and strings. In addition to this, there is an interesting lesson that I have learned since written "STL extensions" library. These days big interest and selling point for augmented data structures is not so much single augmenting by count of container elements, but multiple augmenting, which supports fast summation algorithms (more generally fast computations using binary operations). These data structure can be used to compute important statistical parameters, such as mean value and standard deviation, in logarithmic running time on cheap single processor systems. The algorithms are general and can be applied to both ordered and unordered data sets. Alternative parallel computations require supercomputers to achieve the same performance. The application area - big data and data science. Regards, Vadim Stadnik

Vadim Stadnik

1:12 p.m.

On Fri, Sep 11, 2015 at 8:10 PM, Chris Clearwater <chris@detrino.org> wrote:

...

Greetings,

I have written a library with the intention of submitting it to Boost.Container. My container implements O(log n) random access insert/erase and has a strong focus on performance while supporting the interfaces of the standard library sequence types. An example application would be the character buffer of a text editor.

Useful links: - Source code: https://github.com/det/segmented_tree_seq - Documentation: http://det.github.io/segmented_tree_seq/ - Benchmarks: https://github.com/det/segmented_tree_seq/tree/benchmarks

Any feedback is greatly appreciated.

There are many variants of B+ trees. A document with explanation of the implemented data structure and key algorithms would be quite helpful. It should provide figures. Is this a level-linked variant of B+ trees or not? To make this B+ tree much more useful, it should supports not only sequences, but also std::set<T>, std::multiset<T>, std::map<T> and std::multimap<T>. Alternatively, you can implement all of these standard containers using your B+ tree as underlying data structure. Regards, Vadim Stadnik

Chris Clearwater

2:06 p.m.

...

There are many variants of B+ trees. A document with explanation of the implemented data structure and key algorithms would be quite helpful. It should provide figures.

Is this a level-linked variant of B+ trees or not? No, it is not. In fact the bottom level of the tree doesn't contain any

On 09/12/2015 06:12 AM, Vadim Stadnik wrote: metadata at all. This is done to minimize cache misses. All other levels include parent pointer/parent index/length metadata.

...

To make this B+ tree much more useful, it should supports not only sequences, but also std::set<T>, std::multiset<T>, std::map<T> and std::multimap<T>. Alternatively, you can implement all of these standard containers using your B+ tree as underlying data structure.

Yes, I agree this would increase its usefullnes, but it's not something I plan on supporting now. You might also be interested to see the benchmarks[1] I've included in the repository. Your container is included for comparison (bpt_sequence). [1] https://github.com/det/segmented_tree_seq/tree/benchmarks

Phil Endecott

8:53 p.m.

New subject: Determining interest in container with efficient random access insert and erase

Hi Chris, Chris Clearwater wrote:

...

I have written a library with the intention of submitting it to Boost.Container. My container implements O(log n) random access insert/erase and has a strong focus on performance while supporting the interfaces of the standard library sequence types. An example application would be the character buffer of a text editor.

I think this could be most useful if it were decomposed into several layers: - An in-memory B+-Tree - An augmented tree - An order-statistics tree That way, we could have in-memory B+-trees the provide a std::map-like interface, which would surely be useful, and we could have O(log N) random access insert/erase on a binary tree where that is appropriate, and we could have other kinds of augmented trees such as interval trees. Working out exactly what these layers should look like is not easy, but I think it's worth doing in order to broaden the applicability. Regards, Phil.

Francisco José Tapia

14 Sep 14 Sep

11:20 a.m.

You have these characteristic in the Countertree library ( https://github.com/fjtapia/countertree_2.0) You have a data structure with the same interface than the std::vector and std::deque, with access by position like the vectors. All the operations ( read, write, modification ) are O (logN). You can find a concurrent and not concurrent version of the data structure. You have Random access iterators with access time O(logN) ( you can run a std::sort over the data structure with good results). You have too, a STL compatible set, multiset , map and multimap, with concurrent versions, with a new type of mutex designed for to improve the concurrency. And associated to this , parallel algorithms over these data structures. The library is pending of the final approval since more than a year ago. Actually, I am busy with the parallel sorting algorithms. When finish with it, I will continue with the Countertree I have designed the main part of the new version, designed for a very big trees, with new parallel algorithms. By example : create a std::set with 30000000 random elements uint64_t, my computer needs 52 seconds, with the new algorithms with 1 thread 6 seconds , and with 4 threads 3 seconds. This can be important, for the data base in memory, and in many search applications. ( A single server permit millions of operations per second ). When finish the new version , I will propose to the Boost community for the acceptation test

Chris Clearwater

11:56 a.m.

On 09/14/2015 04:20 AM, Francisco José Tapia wrote:

...

You have these characteristic in the Countertree library ( https://github.com/fjtapia/countertree_2.0)

You have a data structure with the same interface than the std::vector and std::deque, with access by position like the vectors. All the operations ( read, write, modification ) are O (logN). You can find a concurrent and not concurrent version of the data structure.

You have Random access iterators with access time O(logN) ( you can run a std::sort over the data structure with good results).

You have too, a STL compatible set, multiset , map and multimap, with concurrent versions, with a new type of mutex designed for to improve the concurrency. And associated to this , parallel algorithms over these data structures.

The library is pending of the final approval since more than a year ago.

Actually, I am busy with the parallel sorting algorithms. When finish with it, I will continue with the Countertree

I have designed the main part of the new version, designed for a very big trees, with new parallel algorithms. By example : create a std::set with 30000000 random elements uint64_t, my computer needs 52 seconds, with the new algorithms with 1 thread 6 seconds , and with 4 threads 3 seconds.

This can be important, for the data base in memory, and in many search applications. ( A single server permit millions of operations per second ).

When finish the new version , I will propose to the Boost community for the acceptation test

I benchmarked your vector_tree implementation vs mine and came up with the following results: 4x slower for random access insert and erase of 64 bit data, 16x slower for iteration of 64 bit data, 10x slower for random access insert and erase of 8 bit data and 110x slower iteration of 8 bit data (not to mention the extreme memory overhead of allocation per byte). Also, I had to fix 1 compile error and I get tons of warnings spewed to my terminal because you appear to return a signed typed for the size() member function. I wonder too how you get constant time push_back/push_front when such an operation requires updating log(n) branch nodes in a typical counted tree implementation.

Chris Clearwater

12:53 p.m.

On 09/14/2015 04:56 AM, Chris Clearwater wrote:

...

I wonder too how you get constant time push_back/push_front when such an operation requires updating log(n) branch nodes in a typical counted tree implementation.

My mistake, you don't make this claim.

Thorsten Ottosen

1:13 p.m.

On 14-09-2015 13:20, Francisco José Tapia wrote:

...

You have these characteristic in the Countertree library ( https://github.com/fjtapia/countertree_2.0)

...

You have Random access iterators with access time O(logN) ( you can run a std::sort over the data structure with good results).

Does this imply an O( N lg N lg N ) sort running time? -Thorsten

Neil Groves

1:19 p.m.

...

You have Random access iterators with access time O(logN) ( you can run a std::sort over the data structure with good results).

Traversal operations are mandated to be Amortized O(1) for random-access iterators. If the traversal is O(log N) it isn't a random-access iterator. http://en.cppreference.com/w/cpp/concept/RandomAccessIterator Neil

Francisco José Tapia

2:11 p.m.

The iterators are not like the vector's itertors, but they are not like the list iterators. They permit you go directly to a position without travel all the previous nodes. It's a different type. You can do a std::sort over the data structure with good results. Really the ++operator and --operator are O(1), because involve a small number of nodes, and don't depend of the tree size. The polynomy of any algorith must be multiplied by logN. Then que quicksort is O ( (logN)²). This new type of iterator was not defined, because until now, it was not necessary. But now appear and we must name it, I name as random iterators wth access O(logN), If we agree any other name, I will use. It's a semantic problem only. In the same way, the interface of STL set , map , multiset and multimap is dangerous in a concurrent environment, because you obtain an iterator to a node, and before to access, other thread delete that node , and your program crash. We can wait until the C++ standard define a new interface or we can propose an interface, and begin to use, correct the failures, and learn with the experience, in order to obtain the best solution. When the standard define something, change and adapt. About the speed, mentioned for Chris. I examined briefly your code, I didn't understood what do some benchmarks. I will check. If your code is better, you will have all my support. i will clap the winner. But I need to examine. Let me some days, and I will say you my oppinion 2015-09-14 15:19 GMT+02:00 Neil Groves <neil@grovescomputing.com>:

...

...
You have Random access iterators with access time O(logN) ( you can run a std::sort over the data structure with good results).

Traversal operations are mandated to be Amortized O(1) for random-access iterators. If the traversal is O(log N) it isn't a random-access iterator. http://en.cppreference.com/w/cpp/concept/RandomAccessIterator

Neil

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Neil Groves

2:26 p.m.

On 14 September 2015 at 15:11, Francisco José Tapia <fjtapia@gmail.com> wrote:

...

The iterators are not like the vector's itertors, but they are not like the list iterators. They permit you go directly to a position without travel all the previous nodes. It's a different type. You can do a std::sort over the data structure with good results. Really the ++operator and --operator are O(1), because involve a small number of nodes, and don't depend of the tree size. The polynomy of any algorith must be multiplied by logN. Then que quicksort is O ( (logN)²).

This new type of iterator was not defined, because until now, it was not necessary. But now appear and we must name it, I name as random iterators wth access O(logN), If we agree any other name, I will use. It's a semantic problem only.

I understand the problem. I completely agree that the standard categories for Iterators are sub-optimal. The separation between orthogonal concepts such as traversal, dereference type etc. is frequently problematic. You can define a new Concept, and interfaces. There is no rational reason to reduce your potential set of actions to waiting or violating a Concept. You can't sensibly abuse an existing Concept weakening the guarantees. The standard algorithms such as std::sort provide complexity guarantees that it simply would not be able to adhere to with your iterators since the supplied iterators would be outside of the pre-conditions of calling the algorithm. Often this would 'work' without breaking. However if code has a side-effect during dereference operations it is possible that code could go wrong due to an excessive number of dereference invocations. Even if your internal iterators were robust against this, if they were wrapped in iterator adaptors that made an assumption, and forwarded the random_access guarantee it can go wrong. Violating Concept conditions is not going to be populate with Boost developers. I strongly recommend doing what you are doing, but with a clear new Concept and you will need to make clear calling standard algorithms for RandomAccessIterators with these is technically undefined-behaviour. You might even be able to provide some equivalent non-member functions. I could work with you to make some Boost.Range extension points to make optimal algorithms selected, but with very clear, precise and correct complexity guarantees.

...

In the same way, the interface of STL set , map , multiset and multimap is dangerous in a concurrent environment, because you obtain an iterator to a node, and before to access, other thread delete that node , and your program crash. We can wait until the C++ standard define a new interface or we can propose an interface, and begin to use, correct the failures, and learn with the experience, in order to obtain the best solution. When the standard define something, change and adapt.

This example is not at all the same since the standard guarantees nothing about concurrent access in the Concept definition. It absolutely does guarantee O(1) access. Not specifying something is very different from specifying a tighter condition you wish to ignore. Adapt by all means, but please don't violate design contracts for anything we are to include in Boost. Please don't misunderstand my intent. I'm trying to help you reach a proposal that'll be acceptable. I've frequently been tempted, while developing Boost.Range to do things that work on an implementation level, loosening Concepts. It has uniformly been an error. Now the code has extension points, but these are largely done using non-member functions and overloads and selecting tuned algorithms while retaining strict conformance for other cases. Neil Groves

Francisco José Tapia

15 Sep 15 Sep

10:52 p.m.

Thanks by your interest. I think it will be a good idea create a Concept. About the iterators are similar to the iterators of std::deque as say the description in cppreference.com. “As opposed to std::vector <http://en.cppreference.com/w/cpp/container/vector>, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays.” You have an valid iterator before begin (), you can use begin() -1 , if you increment this iterator you have the begin iterator. In the same way, if you decrement the end() iterator , the iterator obtained pointed to the last element. The iterators can obtain their position in the data structure. The iterator begin() - 1 , return the value -1. This is the reason of the signed integers in the position of the elements, and the explanations of the “tons of warnings” of Mr. Clearwater. You can add or subtract any value to an iterator, and obtain a new iterator in a time proportional to the logarithm of the distance. The increment or decrement an iterator is a O(1) operation, because the average of jumps for to reach the previous or next node is less than 4, independent of the size of the tree. I didn't examine in deep the Concepts before now, but after read the documentation, I think is an excellent idea. Thanks by your information 2015-09-14 16:26 GMT+02:00 Neil Groves <neil@grovescomputing.com>:

...

...
The iterators are not like the vector's itertors, but they are not like

On 14 September 2015 at 15:11, Francisco José Tapia <fjtapia@gmail.com> wrote: the

...
list iterators. They permit you go directly to a position without travel all the previous nodes. It's a different type. You can do a std::sort over the data structure with good results. Really the ++operator and --operator are O(1), because involve a small number of nodes, and don't depend of the tree size. The polynomy of any algorith must be multiplied by logN. Then que quicksort is O ( (logN)²).

This new type of iterator was not defined, because until now, it was not necessary. But now appear and we must name it, I name as random iterators wth access O(logN), If we agree any other name, I will use. It's a semantic problem only.

I understand the problem. I completely agree that the standard categories for Iterators are sub-optimal. The separation between orthogonal concepts such as traversal, dereference type etc. is frequently problematic. You can define a new Concept, and interfaces. There is no rational reason to reduce your potential set of actions to waiting or violating a Concept. You can't sensibly abuse an existing Concept weakening the guarantees. The standard algorithms such as std::sort provide complexity guarantees that it simply would not be able to adhere to with your iterators since the supplied iterators would be outside of the pre-conditions of calling the algorithm. Often this would 'work' without breaking. However if code has a side-effect during dereference operations it is possible that code could go wrong due to an excessive number of dereference invocations. Even if your internal iterators were robust against this, if they were wrapped in iterator adaptors that made an assumption, and forwarded the random_access guarantee it can go wrong.

Violating Concept conditions is not going to be populate with Boost developers. I strongly recommend doing what you are doing, but with a clear new Concept and you will need to make clear calling standard algorithms for RandomAccessIterators with these is technically undefined-behaviour. You might even be able to provide some equivalent non-member functions. I could work with you to make some Boost.Range extension points to make optimal algorithms selected, but with very clear, precise and correct complexity guarantees.

...
In the same way, the interface of STL set , map , multiset and multimap is dangerous in a concurrent environment, because you obtain an iterator to a node, and before to access, other thread delete that node , and your program crash. We can wait until the C++ standard define a new interface or we can propose an interface, and begin to use, correct the failures, and learn with the experience, in order to obtain the best solution. When the standard define something, change and adapt.

This example is not at all the same since the standard guarantees nothing about concurrent access in the Concept definition. It absolutely does guarantee O(1) access. Not specifying something is very different from specifying a tighter condition you wish to ignore.

Adapt by all means, but please don't violate design contracts for anything we are to include in Boost.

Please don't misunderstand my intent. I'm trying to help you reach a proposal that'll be acceptable. I've frequently been tempted, while developing Boost.Range to do things that work on an implementation level, loosening Concepts. It has uniformly been an error. Now the code has extension points, but these are largely done using non-member functions and overloads and selecting tuned algorithms while retaining strict conformance for other cases.

Neil Groves

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Jeremy Murphy

16 Sep 16 Sep

2 a.m.

Without having looked in any detail at the new container in question, the discussion does remind me of a similar container proposed by Aleksandr Kupriianov some months ago. http://lists.boost.org/Archives/boost/2014/07/215224.php It might be worth justifying why your container is better than that one, whether we can abstract aspects of both, etc. We can ask whether it is OK to add these augmented containers one by one as they are offered or whether it might be better to consider a few at once in order to design some consistent interface. Cheers. Jeremy On 16 September 2015 at 08:52, Francisco José Tapia <fjtapia@gmail.com> wrote:

...

Thanks by your interest.

I think it will be a good idea create a Concept.

About the iterators are similar to the iterators of std::deque as say the description in cppreference.com. “As opposed to std::vector <http://en.cppreference.com/w/cpp/container/vector>, the elements of a deque are not stored contiguously: typical implementations use a sequence of individually allocated fixed-size arrays.”

You have an valid iterator before begin (), you can use begin() -1 , if you increment this iterator you have the begin iterator. In the same way, if you decrement the end() iterator , the iterator obtained pointed to the last element.

The iterators can obtain their position in the data structure. The iterator begin() - 1 , return the value -1. This is the reason of the signed integers in the position of the elements, and the explanations of the “tons of warnings” of Mr. Clearwater.

You can add or subtract any value to an iterator, and obtain a new iterator in a time proportional to the logarithm of the distance.

The increment or decrement an iterator is a O(1) operation, because the average of jumps for to reach the previous or next node is less than 4, independent of the size of the tree.

I didn't examine in deep the Concepts before now, but after read the documentation, I think is an excellent idea.

Thanks by your information

2015-09-14 16:26 GMT+02:00 Neil Groves <neil@grovescomputing.com>:

...
...
The iterators are not like the vector's itertors, but they are not like

...
list iterators. They permit you go directly to a position without

On 14 September 2015 at 15:11, Francisco José Tapia <fjtapia@gmail.com> wrote: the travel

...
all the previous nodes. It's a different type. You can do a std::sort over the data structure with good results. Really the ++operator and --operator are O(1), because involve a small number of nodes, and don't depend of the tree size. The polynomy of any algorith must be multiplied by logN. Then que quicksort is O ( (logN)²).

This new type of iterator was not defined, because until now, it was not necessary. But now appear and we must name it, I name as random iterators wth access O(logN), If we agree any other name, I will use. It's a semantic problem only.

I understand the problem. I completely agree that the standard categories for Iterators are sub-optimal. The separation between orthogonal concepts such as traversal, dereference type etc. is frequently problematic. You can define a new Concept, and interfaces. There is no rational reason to reduce your potential set of actions to waiting or violating a Concept. You can't sensibly abuse an existing Concept weakening the guarantees. The standard algorithms such as std::sort provide complexity guarantees that it simply would not be able to adhere to with your iterators since the supplied iterators would be outside of the pre-conditions of calling the algorithm. Often this would 'work' without breaking. However if code has a side-effect during dereference operations it is possible that code could go wrong due to an excessive number of dereference invocations. Even if your internal iterators were robust against this, if they were wrapped in iterator adaptors that made an assumption, and forwarded the random_access guarantee it can go wrong.

Violating Concept conditions is not going to be populate with Boost developers. I strongly recommend doing what you are doing, but with a clear new Concept and you will need to make clear calling standard algorithms for RandomAccessIterators with these is technically undefined-behaviour. You might even be able to provide some equivalent non-member functions. I could work with you to make some Boost.Range extension points to make optimal algorithms selected, but with very clear, precise and correct complexity guarantees.

...
In the same way, the interface of STL set , map , multiset and multimap is dangerous in a concurrent environment, because you obtain an iterator to a node, and before to access, other thread delete that node , and your program crash. We can wait until the C++ standard define a new interface or we can propose an interface, and begin to use, correct the failures, and learn with the experience, in order to obtain the best solution. When the standard define something, change and adapt.

This example is not at all the same since the standard guarantees nothing about concurrent access in the Concept definition. It absolutely does guarantee O(1) access. Not specifying something is very different from specifying a tighter condition you wish to ignore.

Adapt by all means, but please don't violate design contracts for anything we are to include in Boost.

Please don't misunderstand my intent. I'm trying to help you reach a proposal that'll be acceptable. I've frequently been tempted, while developing Boost.Range to do things that work on an implementation level, loosening Concepts. It has uniformly been an error. Now the code has extension points, but these are largely done using non-member functions and overloads and selecting tuned algorithms while retaining strict conformance for other cases.

Neil Groves

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Jeremy Murphy

2:04 a.m.

Gah, top-posting, sorry.

3576

Age (days ago)

3581

Last active (days ago)

List overview

Download

28 comments

14 participants

participants (14)

Andrey Semashev
Andy Thomason
Chris Clearwater
Christian N.
Francisco José Tapia
Glen Fernandes
Ion Gaztañaga
Jeremy Murphy
Joaquín M López Munoz
Neil Groves
Phil Endecott
Rene Rivera
Thorsten Ottosen
Vadim Stadnik