[container] Draft development plan for Boost.Container
Hi to all, Boost.Container was accepted in 2011 when container classes were spun off from Boost.Interprocess to form a new library. At that time it offered new features like emulated move semantics for "classic" STL-like containers and some new classes like flat_map and stable_vector. During these years more containers/main features were added: - The scoped allocator Model was added in Boost 1.50 (2012) - static_vector was added in Boost 1.54 (2013) - small_vector was added in Boost 1.58 (2015) - Polymorphic Memory Resources were added in Boost 1.60 (2015) - devector was added in Boost 1.75 (2020) Lately I've been collecting some ideas about how to improve the library and keep it interesting/useful both for other Boost libraries and Boost users. I've somewhat formalized a bit some of those ideas and I'd like to share them here in order to obtain feedback from the developers. Here's the document: https://docs.google.com/document/d/1VdMTheFUczC946dP3VKPseOoUq3sVKjU73oMvK8F... Note that this document does not outline a schedule for all those ideas, it's an initial description of possible developments that should be evaluated/prioritized, as developing a substantial subset of the proposals would be a huge task. Thanks in advance for your attention and comments. Best, Ion
On 5/21/24 08:03, Ion Gaztañaga via Boost wrote:
Hi to all,
Boost.Container was accepted in 2011 when container classes were spun off from Boost.Interprocess to form a new library. At that time it offered new features like emulated move semantics for "classic" STL-like containers and some new classes like flat_map and stable_vector.
During these years more containers/main features were added:
- The scoped allocator Model was added in Boost 1.50 (2012) - static_vector was added in Boost 1.54 (2013) - small_vector was added in Boost 1.58 (2015) - Polymorphic Memory Resources were added in Boost 1.60 (2015) - devector was added in Boost 1.75 (2020)
Lately I've been collecting some ideas about how to improve the library and keep it interesting/useful both for other Boost libraries and Boost users.
I've somewhat formalized a bit some of those ideas and I'd like to share them here in order to obtain feedback from the developers. Here's the document:
https://docs.google.com/document/d/1VdMTheFUczC946dP3VKPseOoUq3sVKjU73oMvK8F...
Note that this document does not outline a schedule for all those ideas, it's an initial description of possible developments that should be evaluated/prioritized, as developing a substantial subset of the proposals would be a huge task.
Thanks in advance for your attention and comments.
One issue I had with Boost.Container deque is that it only allows customizing allocation block size for a given value type and not the container itself. This prevents having multiple deques for a given value type that use different block sizes. It would be useful to be able to specify the block size in the deque template parameters. https://github.com/boostorg/container/issues/278
El 21/05/2024 a las 9:41, Andrey Semashev via Boost escribió:
One issue I had with Boost.Container deque is that it only allows customizing allocation block size for a given value type and not the container itself. This prevents having multiple deques for a given value type that use different block sizes.
It would be useful to be able to specify the block size in the deque template parameters.
I don't understand your point. You can specify the block size as an option, it's not tied to a value_type, AFAIK: https://www.boost.org/doc/libs/master/doc/html/container/configurable_contai... Best, Ion
On 5/21/24 12:26, Ion Gaztañaga via Boost wrote:
El 21/05/2024 a las 9:41, Andrey Semashev via Boost escribió:
One issue I had with Boost.Container deque is that it only allows customizing allocation block size for a given value type and not the container itself. This prevents having multiple deques for a given value type that use different block sizes.
It would be useful to be able to specify the block size in the deque template parameters.
I don't understand your point. You can specify the block size as an option, it's not tied to a value_type, AFAIK:
https://www.boost.org/doc/libs/master/doc/html/container/configurable_contai...
Oh, I didn't know about that option. Though it would make more sense if the block size was specified in terms of container elements rather than bytes. Oh #2, I can see there is block_size, but it is not documented as an acceptable option for deque_options. Is it actually supported? I thought, the user was supposed to specialize deque_block_size for their value type. Given that there are container-specific options for this purpose, you can consider my request fulfilled, and sorry for the noise.
On 5/21/24 14:14, Andrey Semashev wrote:
On 5/21/24 12:26, Ion Gaztañaga via Boost wrote:
El 21/05/2024 a las 9:41, Andrey Semashev via Boost escribió:
One issue I had with Boost.Container deque is that it only allows customizing allocation block size for a given value type and not the container itself. This prevents having multiple deques for a given value type that use different block sizes.
It would be useful to be able to specify the block size in the deque template parameters.
I don't understand your point. You can specify the block size as an option, it's not tied to a value_type, AFAIK:
https://www.boost.org/doc/libs/master/doc/html/container/configurable_contai...
Oh, I didn't know about that option. Though it would make more sense if the block size was specified in terms of container elements rather than bytes.
Oh #2, I can see there is block_size, but it is not documented as an acceptable option for deque_options. Is it actually supported?
To clarify, it is documented on the page you linked, but not in the Reference: https://www.boost.org/doc/libs/master/doc/html/boost/container/deque_options...
I thought, the user was supposed to specialize deque_block_size for their value type. Given that there are container-specific options for this purpose, you can consider my request fulfilled, and sorry for the noise.
To clarify, it is documented on the page you linked, but not in the Reference:
https://www.boost.org/doc/libs/master/doc/html/boost/container/deque_options...
Thanks for the report, it's a documentation bug. Fixed now: https://github.com/boostorg/container/commit/8d179787482f389d9dc564273e81b65... Best, Ion
Gesendet: Dienstag, 21. Mai 2024 um 07:03 Uhr Von: "Ion Gaztañaga via Boost"
Lately I've been collecting some ideas about how to improve the library and keep it interesting/useful both for other Boost libraries and Boost users.
Maybe you could also consider: "boost small_vector cannot go back to use stack space" https://stackoverflow.com/questions/54429138/boost-small-vector-cannot-go-ba... Helmut
El 21/05/2024 a las 17:16, Helmut Zeisel via Boost escribió:
Gesendet: Dienstag, 21. Mai 2024 um 07:03 Uhr Von: "Ion Gaztañaga via Boost"
Lately I've been collecting some ideas about how to improve the library and keep it interesting/useful both for other Boost libraries and Boost users.
Maybe you could also consider:
"boost small_vector cannot go back to use stack space"
https://stackoverflow.com/questions/54429138/boost-small-vector-cannot-go-ba...
Thanks for the ping. I can't remember what's the technical reason for the current behavior, but it's explicitly documented. Maybe shrink_to_fit can be fixed. Anyway, I've opened an issue so that this is not forgotten: https://github.com/boostorg/container/issues/279 Best, Ion
El 22/05/2024 a las 6:37, Helmut Zeisel via Boost escribió:
Gesendet: Dienstag, 21. Mai 2024 um 23:18 Uhr Von: "Ion Gaztañaga via Boost"
but it's explicitly documented.
Where?
https://www.boost.org/doc/libs/master/doc/html/boost/container/small_vector.... "Any change to the capacity of the vector, including decreasing its size such as with the shrink_to_fit method, will cause the vector to permanently switch to dynamically allocated storage"
Maybe shrink_to_fit can be fixed.
Also clear and swap. Maybe also the move assignement.
There are several downsides: A) "clear" in vectors does not reduce the capacity, I think that would be surprising because many times a user does not expect new allocations after clear() if the previous capacity was big enough. B) Moving to internal storage when move assigning could make the operation throwing, when moving the dynamic buffer can be faster and nothrow. LLVM and Folly take the dynamic buffer of the source vector if available: + Folly: https://github.com/facebook/folly/blob/main/folly/container/small_vector.h#L... + LLVM: https://llvm.org/doxygen/SmallVector_8h_source.html#l01069 C) Swap is similar to a double move assignment. So it's a trade-off, I think current boost::container::small_vector follows the "existing practice" of "least surprising behavior"... Best, Ion
Gesendet: Mittwoch, 22. Mai 2024 um 08:45 Uhr
Von: "Ion Gaztañaga via Boost"
A) "clear" in vectors does not reduce the capacity, I think that would be surprising because many times a user does not expect new allocations after clear() if the previous capacity was big enough.
OK
B) Moving to internal storage when move assigning could make the operation throwing, when moving the dynamic buffer can be faster and nothrow. LLVM and Folly take the dynamic buffer of the source vector if available:
This is true when you move from a dynamic buffer. If you move from the internal buffer, however, you must move the individual elements anyway.
C) Swap is similar to a double move assignment.
The interesting situation is when one vector uses a dynamic buffer and the other one the internal buffer. Currently swap allocates also a dynamic buffer for the vector with the internal buffer. Helmut
Hi,
C) Swap is similar to a double move assignment.
The interesting situation is when one vector uses a dynamic buffer and the other one the internal buffer. Currently swap allocates also a dynamic buffer for the vector with the internal buffer.
Thanks for pointing this out. If the memory from the vector using the dynamic buffer can be transferred (e.g. equal allocators or propagate_on_container_swap is true), then we should avoid a new allocation transferring elements to the internal buffer. I've updated the issue: https://github.com/boostorg/container/issues/279 Best, Ion
El 22/05/2024 a las 22:16, Ion Gaztañaga via Boost escribió:
Hi,
C) Swap is similar to a double move assignment.
The interesting situation is when one vector uses a dynamic buffer and the other one the internal buffer. Currently swap allocates also a dynamic buffer for the vector with the internal buffer.
Thanks for pointing this out. If the memory from the vector using the dynamic buffer can be transferred (e.g. equal allocators or propagate_on_container_swap is true), then we should avoid a new allocation transferring elements to the internal buffer. I've updated the issue:
The issues has been resolved. Now shrink_to_fit and swap should use the internal storage when appropriate ;-) Best, Ion
Gesendet: Mittwoch, 22. Mai 2024 um 08:45 Uhr
Von: "Ion Gaztañaga via Boost"
A) "clear" in vectors does not reduce the capacity,
Maybe this could be explicitely mentioned in the documentation https://www.boost.org/doc/libs/master/doc/html/boost/container/vector.html#i... e.g. like in https://en.cppreference.com/w/cpp/container/vector/clear Helmut
El 23/05/2024 a las 5:47, Helmut Zeisel via Boost escribió:
Gesendet: Mittwoch, 22. Mai 2024 um 08:45 Uhr Von: "Ion Gaztañaga via Boost"
A) "clear" in vectors does not reduce the capacity,
Maybe this could be explicitely mentioned in the documentation
https://www.boost.org/doc/libs/master/doc/html/boost/container/vector.html#i...
e.g. like in
Thanks for the suggestion. Applied: https://github.com/boostorg/container/commit/24c978b550dcdfcca7a07efc4ce3323... Best, Ion
On 21/05/2024 06:03, Ion Gaztañaga via Boost wrote:
Lately I've been collecting some ideas about how to improve the library and keep it interesting/useful both for other Boost libraries and Boost users.
Separating the algorithm from the container, or ability to use the algorithm with a third party container implementation, would be a win. The STL allocator design imposes a lot of unfortunate design choices on container implementation, and removing both constraints frees up potential designs. I wish we had a rope of string (views) in Boost. I know proposed Boost.Text has them, so maybe not so urgent. Still, they enable transformations of strings in a particularly efficient way. Niall
On 5/21/24 19:22, Niall Douglas via Boost wrote:
On 21/05/2024 06:03, Ion Gaztañaga via Boost wrote:
Lately I've been collecting some ideas about how to improve the library and keep it interesting/useful both for other Boost libraries and Boost users.
Separating the algorithm from the container, or ability to use the algorithm with a third party container implementation, would be a win.
Boost.Intrusive fits this niche.
On 21/05/2024 17:24, Andrey Semashev via Boost wrote:
On 5/21/24 19:22, Niall Douglas via Boost wrote:
On 21/05/2024 06:03, Ion Gaztañaga via Boost wrote:
Lately I've been collecting some ideas about how to improve the library and keep it interesting/useful both for other Boost libraries and Boost users.
Separating the algorithm from the container, or ability to use the algorithm with a third party container implementation, would be a win.
Boost.Intrusive fits this niche.
Not exactly.
Boost.Intrusive requires you to inject stuff into your values.
What I'm thinking of is customisation points so you could override how
Boost.Container implements containers and allocators. No need to modify
your code or your values, you would just inject customisation points for
your types.
In case the value add isn't obvious, Boost.Intrusive generally insists
that you need to insert specific items and types into your values. The
customisation points injection technique can, in some circumstances, let
you take advantage of the value type's specifics to more tightly encode
the information the container needs to store per value.
For example:
struct value_type
{
uint64_t field1: 40;
uint64_t unused1: 24;
uint64_t field2: 40;
uint64_t unused2: 24;
uint64_t field3: 40;
uint64_t unused3: 24;
};
template<> struct boost::container::read_left
El 21/05/2024 a las 21:31, Niall Douglas via Boost escribió:
Boost.Intrusive requires you to inject stuff into your values.
Intrusive also offers pure algorithms that don't require hooks: https://www.boost.org/doc/libs/1_85_0/doc/html/intrusive/node_algorithms.htm... or you if your type can afford it, you can specify the relationship between your value type and the node (and both can be the same type). https://www.boost.org/doc/libs/1_85_0/doc/html/intrusive/value_traits.html#i... Not sure if this is what you are looking for. Ion
On Tue, 21 May 2024, Ion Gaztañaga via Boost wrote:
Lately I've been collecting some ideas about how to improve the library and keep it interesting/useful both for other Boost libraries and Boost users.
I've somewhat formalized a bit some of those ideas and I'd like to share them here in order to obtain feedback from the developers. Here's the document:
https://docs.google.com/document/d/1VdMTheFUczC946dP3VKPseOoUq3sVKjU73oMvK8F...
Note that this document does not outline a schedule for all those ideas, it's an initial description of possible developments that should be evaluated/prioritized, as developing a substantial subset of the proposals would be a huge task.
Thanks in advance for your attention and comments.
Hello,
since you are asking, here are some comments as a user.
Performance:
You already have #248 about a performance regression in flat_map. It can
currently be twice as fast to use a struct { int key; mutable int value; }
in a flat_set than a flat_map
El 24/05/2024 a las 19:54, Marc Glisse via Boost escribió:
On Tue, 21 May 2024, Ion Gaztañaga via Boost wrote: since you are asking, here are some comments as a user.
Performance:
You already have #248 about a performance regression in flat_map. It can currently be twice as fast to use a struct { int key; mutable int value; } in a flat_set than a flat_map
. I would prioritize fixing known performance issues before looking for others, but I guess it is a worthy goal.
Yes, I previously was using type-punning to be able to use memcpy in flat_map when key and mapped_type were trivial, but compiler warnings and some suspicious aliasing issues were reported, so I had to roll-back. I'll need to revisit this, I'm not sure if other flatmap implementations (chromium, folly...) perform some type of type punning to achieve the speedup.
- Among what you propose, I may be interested in the B/T-trees and the skip lists, if their performance turns out to be worth it. For hive, if the existing implementation can be included in boost with only minor tweaks, why not. Otherwise, if hive gets standardized and standard library vendors don't mess up their implementation too much, having a version in boost doesn't seem worth the trouble, or at least it is less interesting than having structures not present in the standard. segtor: for the longest time I thought deque used blocks of varying size... It looks interesting, but I am not sure I would actually use it.
I appreciate the feedback.
- https://en.wikipedia.org/wiki/Order_statistic_tree (it could be a new option in an existing container or at least share a lot of code with other trees)
To achieve this we should add augmented trees in Boost.Intrusive (it' been proposed, but not implemented).
- vector optimized for the empty case: I was going to suggest it, but I now see you have added it to the document since the initial post :-)
Thanks. Do you know any public reference implementation of this idea or some talk/paper that explains the advantages, some benchmark etc.? I tried to find some information when writing the plan, but I couldn't find something relevant.
Improvements:
For performance, it is often useful to provide an allocator to node-based structures like list and set. A well-tuned allocator may want to know the size of the nodes in advance (preferably at compile-time, not wait for the first call to allocate()). However, this information can be very inconvenient to access. If you could provide a simple, reliable way to extract this information (one that still works if a user defines recursive maps or other complicated cases with incomplete types), that would be great.
Thanks, this seems a good idea.
Document a way to go from a pointer to an iterator in set and other datastructures. Maybe we only needed that because we didn't know about the advanced options of intrusive lists at the time, but for a map whose values are connected through intrusive linked lists, we had to write fragile code (https://github.com/GUDHI/gudhi-devel/blob/affcfac39bf479a9ac64dcb41e7650d0f8...) to get from a linked list hook to a map iterator.
This sounds similar to Boost.Intrusive iterator_to: https://www.boost.org/doc/libs/1_85_0/doc/html/intrusive/obtaining_iterators... but for Intrusive this is easy because "value_type" is the whole node, whereas in boost::container::set/list/map value_type is embedded in a node. Maybe some pointer arithmetic magic could be used to offer a "iterator_to". Boost Multiindex already offers a similar utility: https://www.boost.org/doc/libs/1_69_0/libs/multi_index/doc/tutorial/indices....
Maintenance:
It seems important to follow what is going on in the standard. I see you already have is_transparent, the node stuff, that's good. You are missing at least erase_if though.
Yes, Boost.Container should at least implement at least the most useful operations.
Thank you for maintaining this Boost library,
I appreciate your feedback and thanks for using it. Best, Ion
On Sat, 25 May 2024, Ion Gaztañaga via Boost wrote:
- vector optimized for the empty case: I was going to suggest it, but I now see you have added it to the document since the initial post :-)
Thanks. Do you know any public reference implementation of this idea or some talk/paper that explains the advantages, some benchmark etc.? I tried to find some information when writing the plan, but I couldn't find something relevant.
I didn't have much success with a quick google search either, so I'll have to go from memory. I seem to remember from previous discussions on the subject that there were several partial implementations in the wild, with one in a famous web browser in particular, but that was a long time ago. On the other hand, the reference-counted version of std::string that libstdc++ can still use in compatibility mode does use this strategy of a pointer to a region that contains both a header and the actual elements. https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/bits/co... My use-case (though we did something else in the end) would have been for a tree where nodes can have an arbitrary number of children, and leaves are more numerous than internal nodes. The vector would have been hidden in a flat_map. -- Marc Glisse
El 26/05/2024 a las 22:53, Marc Glisse via Boost escribió:
I didn't have much success with a quick google search either, so I'll have to go from memory. I seem to remember from previous discussions on the subject that there were several partial implementations in the wild, with one in a famous web browser in particular, but that was a long time ago.
On the other hand, the reference-counted version of std::string that libstdc++ can still use in compatibility mode does use this strategy of a pointer to a region that contains both a header and the actual elements. https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=libstdc%2B%2B-v3/include/bits/co...
Good, I'll check it.
My use-case (though we did something else in the end) would have been for a tree where nodes can have an arbitrary number of children, and leaves are more numerous than internal nodes. The vector would have been hidden in a flat_map.
Understood. Thanks for the use-case. Best, Ion
Performance:
You already have #248 about a performance regression in flat_map. It can currently be twice as fast to use a struct { int key; mutable int value; } in a flat_set than a flat_map
. I would prioritize fixing known performance issues before looking for others, but I guess it is a worthy goal. Yes, I previously was using type-punning to be able to use memcpy in flat_map when key and mapped_type were trivial, but compiler warnings and some suspicious aliasing issues were reported, so I had to roll-back. I'll need to revisit this, I'm not sure if other flatmap implementations (chromium, folly...) perform some type of type punning to achieve the speedup.
I have seen Abseil using a union of `pair
вт, 21 мая 2024 г. в 08:03, Ion Gaztañaga via Boost
Note that this document does not outline a schedule for all those ideas, it's an initial description of possible developments that should be evaluated/prioritized, as developing a substantial subset of the proposals would be a huge task.
Thanks in advance for your attention and comments.
There's one container that I find really useful and that is not in a
list. It's a dynamically allocated array that knows its size and
allows initialization of elements with the initial value or via a
lambda.
If fills the gap between vector and unique_ptr
El 11/06/2024 a las 0:46, Antony Polukhin via Boost escribió:
вт, 21 мая 2024 г. в 08:03, Ion Gaztañaga via Boost
: <...> Note that this document does not outline a schedule for all those ideas, it's an initial description of possible developments that should be evaluated/prioritized, as developing a substantial subset of the proposals would be a huge task.
Thanks in advance for your attention and comments.
There's one container that I find really useful and that is not in a list. It's a dynamically allocated array that knows its size and allows initialization of elements with the initial value or via a lambda.
If fills the gap between vector and unique_ptr
: * works with non-movable and non-copyable types (unlike vector) * works with not default constructible types (unlike unique_ptr ) * has continuous layout * usable with range based for (unlike unique_ptr ) * smaller than vector (does not need to store capacity)
Hi, So something like the proposed std::dynarray with additional features. It's an interesting use case, thanks.
Here's a prototype https://github.com/userver-framework/userver/blob/develop/universal/include/... We called that container FixedArray, because it is an array of a fixed size. And because it "fixes" the issues of dynamically allocated arrays.
Thanks for the prototype link. I'll this information to the draft development plan. Best, Ion
participants (8)
-
Alexander Grund
-
Andrey Semashev
-
Antony Polukhin
-
Helmut Zeisel
-
Ion Gaztañaga
-
Marc Glisse
-
Niall Douglas
-
Ruben Perez
-
Vinnie Falco