I've been using boost::container::flat_set with quite a success until recently when I profiled my app to see that 30-50% of the run-time was spent in insert() for a container of around 50,000 in size. So, I had to replace boost::container::flat_set with a vector variation which inserts as a vector (no sorting). However, when it is the time to call begin(), end(), find(), it sorts once. That shaved the mentioned overhead off. Say, one run is down from 17s to 10s and the likes. I wonder if that should be the boost::container::flat_set default behavior rather than sorting the underlying container with every insert(). After all, as a user I do not care if the underlying container is sorted... until I start using/traversing it, right? Or maybe I am missing something in boost::container::flat_set which already allows me to do something of that sort?
On 03/26/17 06:02, Vladimir Batov via Boost wrote:
I've been using boost::container::flat_set with quite a success until recently when I profiled my app to see that 30-50% of the run-time was spent in insert() for a container of around 50,000 in size.
So, I had to replace boost::container::flat_set with a vector variation which inserts as a vector (no sorting). However, when it is the time to call begin(), end(), find(), it sorts once. That shaved the mentioned overhead off. Say, one run is down from 17s to 10s and the likes.
AFAIU, flat_set should not sort on insert. It should do lower_bound to find the insert position, which is O(log(N)). Sorting instead would give O(N*log(N)). What you're probably seeing is the overhead that comes from inserting in the middle of the vector that is used by flat_set internally. This especially matters if elements have inefficient move/copy. Flat containers are not well suited for frequent modifications, so that is kind of expected.
I wonder if that should be the boost::container::flat_set default behavior rather than sorting the underlying container with every insert(). After all, as a user I do not care if the underlying container is sorted... until I start using/traversing it, right?
No, this won't work in general because these methods are not supposed to modify the container. In particular, they don't invalidate iterators or references, and calling these methods concurrently is thread-safe.
Or maybe I am missing something in boost::container::flat_set which already allows me to do something of that sort?
You can try using the methods and constructors that accept the ordered_unique_range tag. If you can prepare a sequence of ordered and unique elements, you can call insert(ordered_unique_range, s.begin(), s.end()) or similarly construct the flat_set to avoid ordering the sorted elements.
On 2017-03-26 21:28, Andrey Semashev via Boost wrote:
On 03/26/17 06:02, Vladimir Batov via Boost wrote:
I've been using boost::container::flat_set with quite a success until recently when I profiled my app to see that 30-50% of the run-time was spent in insert() for a container of around 50,000 in size.
So, I had to replace boost::container::flat_set with a vector variation which inserts as a vector (no sorting). However, when it is the time to call begin(), end(), find(), it sorts once. That shaved the mentioned overhead off. Say, one run is down from 17s to 10s and the likes.
AFAIU, flat_set should not sort on insert. It should do lower_bound to find the insert position, which is O(log(N)). Sorting instead would give O(N*log(N)).
Yes, indeed. I expressed myself incorrectly. I did not mean sorting as such. My apologies.
What you're probably seeing is the overhead that comes from inserting in the middle of the vector that is used by flat_set internally. This especially matters if elements have inefficient move/copy.
Yes, I initially thought so too... but the value_type is merely a pair of pointers. The comparison operation is expensive. So, it appears it plays quite a role during initial construction. When I eliminated that as described and sorted once... huge boost.
Flat containers are not well suited for frequent modifications, so that is kind of expected.
Understood. The described use-case constructs the flat_set only once (but a big one and one insert at a time). So, on the surface all appeared by the book -- constructed once, traversed/searched often. Still, the observed construction overhead really stunned me.
I wonder if that should be the boost::container::flat_set default behavior rather than sorting the underlying container with every insert(). After all, as a user I do not care if the underlying container is sorted... until I start using/traversing it, right?
No, this won't work in general because these methods are not supposed to modify the container. In particular, they don't invalidate iterators or references, and calling these methods concurrently is thread-safe.
I do not feel those begin() et al will invalidate anything because the first call to, say, begin() will sort the underlying container once. After that it all works as usual, right?
Or maybe I am missing something in boost::container::flat_set which already allows me to do something of that sort?
You can try using the methods and constructors that accept the ordered_unique_range tag. If you can prepare a sequence of ordered and unique elements, you can call insert(ordered_unique_range, s.begin(), s.end()) or similarly construct the flat_set to avoid ordering the sorted elements.
Yes, I am aware of ordered_unique_range tag. Unfortunately, (as you described) it is getting messier and messier -- populate another sorted container, then transfer to flat_set. My suggestion seems like an improvement.
On 03/26/17 14:00, Vladimir Batov via Boost wrote:
On 2017-03-26 21:28, Andrey Semashev via Boost wrote:
What you're probably seeing is the overhead that comes from inserting in the middle of the vector that is used by flat_set internally. This especially matters if elements have inefficient move/copy.
Yes, I initially thought so too... but the value_type is merely a pair of pointers. The comparison operation is expensive. So, it appears it plays quite a role during initial construction. When I eliminated that as described and sorted once... huge boost.
Well, if the ordering predicate plays the defining role, I don't quite see where your gain is. Inserting N elements in an empty flat_set or inserting N elements in a vector and then sorting would give you the same number of calls to the predicate, unless I'm missing something. I still think that moving around lots of pointers on every insertion (you said the end size was ~50000 elements and I'm assuming normal distribution of insert positions) is the culprit. E.g. towards the end of construction inserting an element in the middle would have to move 50000 * 8 = ~390 KiB of data, which is quite a lot. Did you try profiling the code?
No, this won't work in general because these methods are not supposed to modify the container. In particular, they don't invalidate iterators or references, and calling these methods concurrently is thread-safe.
I do not feel those begin() et al will invalidate anything because the first call to, say, begin() will sort the underlying container once. After that it all works as usual, right?
No, not quite. flat_set< int > fs; fs.reserve(5); auto it1 = fs.begin(); auto it2 = fs.insert(42).first; auto it3 = f2.begin(); // invalidates it1 and it2 Also, as I mentioned, multiple threads may me calling begin() or find() concurrently. These methods must also have const-qualified versions to be able to be used on non-mutable containers.
On 03/26/2017 10:35 PM, Andrey Semashev via Boost wrote:
On 03/26/17 14:00, Vladimir Batov via Boost wrote:
On 2017-03-26 21:28, Andrey Semashev via Boost wrote:
What you're probably seeing is the overhead that comes from inserting in the middle of the vector that is used by flat_set internally. This especially matters if elements have inefficient move/copy.
Yes, I initially thought so too... but the value_type is merely a pair of pointers. The comparison operation is expensive. So, it appears it plays quite a role during initial construction. When I eliminated that as described and sorted once... huge boost.
Well, if the ordering predicate plays the defining role, I don't quite see where your gain is. Inserting N elements in an empty flat_set or inserting N elements in a vector and then sorting would give you the same number of calls to the predicate, unless I'm missing something.
I must have failed to describe my use-case as it is not "inserting N elements in an empty flat_set". In fact, it is rather inserting quite a number (about 50,000 to 200,000 so far) of value_type instances on one-by-one basis.
I still think that moving around lots of pointers on every insertion (you said the end size was ~50000 elements and I'm assuming normal distribution of insert positions) is the culprit. E.g. towards the end of construction inserting an element in the middle would have to move 50000 * 8 = ~390 KiB of data, which is quite a lot.
It might well be. I did not profile what exactly contributes so much during flat_set construction. I simply chopped the whole construction off (replaced with straight emplace_back) to see huge 30-50% improvement.
Did you try profiling the code?
:-) You are just teasing me, right? :-)
No, this won't work in general because these methods are not supposed to modify the container. In particular, they don't invalidate iterators or references, and calling these methods concurrently is thread-safe.
I do not feel those begin() et al will invalidate anything because the first call to, say, begin() will sort the underlying container once. After that it all works as usual, right?
No, not quite.
flat_set< int > fs; fs.reserve(5); auto it1 = fs.begin(); auto it2 = fs.insert(42).first; auto it3 = f2.begin(); // invalidates it1 and it2
Well, I do understand that insertion into a std::vector invalidates the iterators. :-) I thought you argued (when you said "No, this won't work in general because...") that "my" approach (of only sorting on when-needed basis) had different to the flat_set behavior. I argued that it was the same. That is, no changes in behavior but a huge construction performance boost. That fact should not probably be simply ignored.
Also, as I mentioned, multiple threads may me calling begin() or find() concurrently. These methods must also have const-qualified versions to be able to be used on non-mutable containers.
I can't see any of it as a fundamental problem. They all addressed in
implementation:
template
On Mon, Mar 27, 2017 at 12:01 AM, Vladimir Batov via Boost
On 03/26/2017 10:35 PM, Andrey Semashev via Boost wrote:
Well, if the ordering predicate plays the defining role, I don't quite see where your gain is. Inserting N elements in an empty flat_set or inserting N elements in a vector and then sorting would give you the same number of calls to the predicate, unless I'm missing something.
I must have failed to describe my use-case as it is not "inserting N elements in an empty flat_set". In fact, it is rather inserting quite a number (about 50,000 to 200,000 so far) of value_type instances on one-by-one basis.
I don't see how that changes anything.
No, this won't work in general because these methods are not supposed to modify the container. In particular, they don't invalidate iterators or references, and calling these methods concurrently is thread-safe.
I do not feel those begin() et al will invalidate anything because the first call to, say, begin() will sort the underlying container once. After that it all works as usual, right?
No, not quite.
flat_set< int > fs; fs.reserve(5);
I've made a mistake here in my example, the following line is missing: fs.insert(10);
auto it1 = fs.begin(); auto it2 = fs.insert(42).first; auto it3 = f2.begin(); // invalidates it1 and it2
Well, I do understand that insertion into a std::vector invalidates the iterators. :-) I thought you argued (when you said "No, this won't work in general because...") that "my" approach (of only sorting on when-needed basis) had different to the flat_set behavior.
I did, and the example above shows where your proposed change breaks the previously valid code.
I argued that it was the same. That is, no changes in behavior but a huge construction performance boost. That fact should not probably be simply ignored.
I'm not opposed to the idea of optimizing your use case. I just don't think flat_set is the right tool for it. The concept of self-organizing containers (including a staging/scratchbook area) is not new, there is a precedent in Boost.Intrusive at least. But those tricks typically don't work well with certain common properties of traditional containers, such as the ones I mentioned. For this reason it is best to provide these new features under new names. IOW, propose a new container instead of subverting behavior of an existing one.
Also, as I mentioned, multiple threads may me calling begin() or find() concurrently. These methods must also have const-qualified versions to be able to be used on non-mutable containers.
I can't see any of it as a fundamental problem. They all addressed in implementation:
template
struct aux::sorted_vector { ... iterator begin () { sort_(); return all_.begin(); } iterator end () { sort_(); return all_.end(); } const_iterator begin () const { sort_(); return all_.begin(); } const_iterator end () const { sort_(); return all_.end(); } ... the sort_() takes care of sorting (if/when needed) and MT safety.
How does sort_() handle thread safety? Are you proposing to add a mutex to the container?
On 03/27/2017 09:20 AM, Andrey Semashev wrote:
On Mon, Mar 27, 2017 at 12:01 AM, Vladimir Batov via Boost
wrote: No, this won't work in general because these methods are not supposed to modify the container. In particular, they don't invalidate iterators or references, and calling these methods concurrently is thread-safe. I do not feel those begin() et al will invalidate anything because the first call to, say, begin() will sort the underlying container once. After that it all works as usual, right? No, not quite.
flat_set< int > fs; fs.reserve(5); fs.insert(10); auto it1 = fs.begin(); auto it2 = fs.insert(42).first; auto it3 = f2.begin(); // invalidates it1 and it2 Well, I do understand that insertion into a std::vector invalidates the iterators. I thought you argued (when you said "No, this won't work in general because...") that "my" approach (of only sorting on when-needed basis) had different to the flat_set behavior. I did, and the example above shows where your proposed change breaks the previously valid code.
I have to disagree with your "proposed change breaks the previously valid code" statement. Indeed, given we know the implementation of flat_set, we can say that the above code does work with the current flat_set implementation. However, the code is not guaranteed to work with the current flat_set. I just re-read flat_set documentation and it does not provide such guarantees for emplace() and insert(). IMO there is a difference between happens-to-work and guaranteed/valid code. Don't you agree?
...
Also, as I mentioned, multiple threads may me calling begin() or find() concurrently. These methods must also have const-qualified versions to be able to be used on non-mutable containers. I can't see any of it as a fundamental problem. They all addressed in implementation:
template
struct aux::sorted_vector { ... iterator begin () { sort_(); return all_.begin(); } iterator end () { sort_(); return all_.end(); } const_iterator begin () const { sort_(); return all_.begin(); } const_iterator end () const { sort_(); return all_.end(); } ... the sort_() takes care of sorting (if/when needed) and MT safety. How does sort_() handle thread safety? Are you proposing to add a mutex to the container?
Yes, I am/was... Is it a problem? I can't see it being a bottle neck as it all will be optimized inside sort_() (no unnecessary locks, etc.). In the end I am not trying to say flat_set is broken or anything. My point is really simple. IMO I was using flat_set in a fairly conventional way and incurred an unexpected and considerable construction overhead. So much so that I had to replace flat_set. I personally find it unfortunate. I have quite a few more places where I use flat_set in a similar setting -- a lengthy elaborate construction followed by much lengthier deployment. My use-case seems fairly straightforward... but I had to replace flat_set. If so, then I am questioning now where I might need flat_sets? That feels hugely unfortunate that a whole chunk of Boost functionality/code (seemingly designed for this very purpose) can't be used. Isn't it a concern? Don't you agree?
On Mon, Mar 27, 2017 at 2:03 AM, Vladimir Batov via Boost
On 03/27/2017 09:20 AM, Andrey Semashev wrote:
On Mon, Mar 27, 2017 at 12:01 AM, Vladimir Batov via Boost
wrote: I do not feel those begin() et al will invalidate anything because the first call to, say, begin() will sort the underlying container once. After that it all works as usual, right?
No, not quite.
flat_set< int > fs; fs.reserve(5); fs.insert(10); auto it1 = fs.begin(); auto it2 = fs.insert(42).first; auto it3 = f2.begin(); // invalidates it1 and it2
Well, I do understand that insertion into a std::vector invalidates the iterators. I thought you argued (when you said "No, this won't work in general because...") that "my" approach (of only sorting on when-needed basis) had different to the flat_set behavior.
I did, and the example above shows where your proposed change breaks the previously valid code.
I have to disagree with your "proposed change breaks the previously valid code" statement. Indeed, given we know the implementation of flat_set, we can say that the above code does work with the current flat_set implementation. However, the code is not guaranteed to work with the current flat_set. I just re-read flat_set documentation and it does not provide such guarantees for emplace() and insert(). IMO there is a difference between happens-to-work and guaranteed/valid code. Don't you agree?
Hmm, I've just checked the flat_set reference [1], and this is what it says: <quote> flat_set is similar to std::set but it's implemented like an ordered vector. This means that inserting a new element into a flat_set invalidates previous iterators and references </quote> It does say it invalidates iterators (BTW, the word "previous" here is misleading; at first I interpreted it as "iterators pointing to previous elements") but it also says the container works like a vector. For vector the standard guarantees that if no reallocation happens on insert then iterators pointing to the elements before the insertion position are not invalidated ([vector.modifiers]/1). I think the flat_set documentation is inaccurate and needs to be corrected.
How does sort_() handle thread safety? Are you proposing to add a mutex to the container?
Yes, I am/was... Is it a problem? I can't see it being a bottle neck as it all will be optimized inside sort_() (no unnecessary locks, etc.).
I'm sorry, but that is a no-go. Too often containers are used with external locking, and internal locking would just waste performance, especially in such a hot spot as begin(). BTW, I don't believe any compiler is able to optimize away unnecessary locking (thank god!)
In the end I am not trying to say flat_set is broken or anything. My point is really simple. IMO I was using flat_set in a fairly conventional way and incurred an unexpected and considerable construction overhead. So much so that I had to replace flat_set. I personally find it unfortunate. I have quite a few more places where I use flat_set in a similar setting -- a lengthy elaborate construction followed by much lengthier deployment. My use-case seems fairly straightforward... but I had to replace flat_set. If so, then I am questioning now where I might need flat_sets? That feels hugely unfortunate that a whole chunk of Boost functionality/code (seemingly designed for this very purpose) can't be used. Isn't it a concern? Don't you agree?
I think flat_set and similar containers work quite well in cases they are targeted for. The fact that you started to care about the container initialization simply means that your case is not the one flat_set is tailored for. That's fine. Just propose a better alternative for your case. [1]: http://www.boost.org/doc/libs/1_63_0/doc/html/boost/container/flat_set.html
On 2017-03-27 10:54, Andrey Semashev wrote:
How does sort_() handle thread safety? Are you proposing to add a mutex to the container?
Yes, I am/was... Is it a problem? I can't see it being a bottle neck as it all will be optimized inside sort_() (no unnecessary locks, etc.).
I'm sorry, but that is a no-go. Too often containers are used with external locking, and internal locking would just waste performance, especially in such a hot spot as begin(). BTW, I don't believe any compiler is able to optimize away unnecessary locking (thank god!)
1. There does not have to be "internal locking would just waste performance": void sort_() { if (sorted) return; lock if (sorted) return; ...do actual sorting } 2. Still, I am not sure how we managed to this point -- std::vector or other containers do not provide any MT guarantees. I would not expect sorted_vector to be that different.
I think flat_set and similar containers work quite well in cases they are targeted for.
I am not sure I see my use-case as so special. Quite the opposite. That's why from the set-go I was wondering if flat_set behavior was correct -- it was not performing in my (seemingly standard) use-case.
The fact that you started to care about the container initialization simply means that your case is not the one flat_set is tailored for. That's fine.
Again, I honestly do not understand what so special about my use-case. My container's main usage is traversal and search. That's why I deployed flat_set in the first place. I only "started to care about the container initialization" after it popped up in the profiler list... Very much unexpectedly so. :-)
Just propose a better alternative for your case.
Indeed, the more I think of it the less I am comfortable with "flat_set". The intention was probably good to mimic a set. However, vector's ears still keep sticking out. So, it seems that sorted_vector might be a more honest and not misleading name... And given the benefits it offers over a set, it seems quite needed. However, from the experience "proposing" is such a daunting one-sided battle. Everyone knows how it should be done... but it's you implementing it... and your implementation always seems done incorrectly. :-) I am not sure I am up for such a battle any more.
On 2017-03-27 07:40, Vladimir Batov via Boost wrote:
On 2017-03-27 10:54, Andrey Semashev wrote:
How does sort_() handle thread safety? Are you proposing to add a mutex to the container?
Yes, I am/was... Is it a problem? I can't see it being a bottle neck as it all will be optimized inside sort_() (no unnecessary locks, etc.).
I'm sorry, but that is a no-go. Too often containers are used with external locking, and internal locking would just waste performance, especially in such a hot spot as begin(). BTW, I don't believe any compiler is able to optimize away unnecessary locking (thank god!)
1. There does not have to be "internal locking would just waste performance":
void sort_() { if (sorted) return; lock if (sorted) return; ...do actual sorting }
Double Checked Locking is notorious for failing randomly. sorted must at least be atomic. One of many articles about it: http://preshing.com/20130930/double-checked-locking-is-fixed-in-cpp11/
AMDG On 03/26/2017 05:03 PM, Vladimir Batov via Boost wrote:
On 03/27/2017 09:20 AM, Andrey Semashev wrote:
On Mon, Mar 27, 2017 at 12:01 AM, Vladimir Batov via Boost
flat_set< int > fs; fs.reserve(5); fs.insert(10); auto it1 = fs.begin(); auto it2 = fs.insert(42).first; auto it3 = f2.begin(); // invalidates it1 and it2 Well, I do understand that insertion into a std::vector invalidates the iterators. I thought you argued (when you said "No, this won't work in general because...") that "my" approach (of only sorting on when-needed basis) had different to the flat_set behavior. I did, and the example above shows where your proposed change breaks the previously valid code.
I have to disagree with your "proposed change breaks the previously valid code" statement. Indeed, given we know the implementation of flat_set, we can say that the above code does work with the current flat_set implementation. However, the code is not guaranteed to work with the current flat_set.
Invalidating it1 may be fine. Invalidating it2 is a definitely not okay. In Christ, Steven Watanabe
On 03/28/2017 03:15 AM, Steven Watanabe via Boost wrote:
AMDG
On 03/26/2017 05:03 PM, Vladimir Batov via Boost wrote:
On 03/27/2017 09:20 AM, Andrey Semashev wrote:
On Mon, Mar 27, 2017 at 12:01 AM, Vladimir Batov via Boost
flat_set< int > fs; fs.reserve(5); fs.insert(10); auto it1 = fs.begin(); auto it2 = fs.insert(42).first; auto it3 = f2.begin(); // invalidates it1 and it2
... Invalidating it1 may be fine. Invalidating it2 is a definitely not okay.
Indeed, nits like the above convince me that taking an ordered vector and calling it a set was probably a good-intentions-driven mistake. That set-oriented mindset then influenced the set-like api and behavior too much (IMO) that in turn affected the performance... the original goal of the flat_set.
On 2017-03-27 22:27, Vladimir Batov via Boost wrote:
On 03/28/2017 03:15 AM, Steven Watanabe via Boost wrote:
AMDG
On 03/26/2017 05:03 PM, Vladimir Batov via Boost wrote:
On 03/27/2017 09:20 AM, Andrey Semashev wrote:
On Mon, Mar 27, 2017 at 12:01 AM, Vladimir Batov via Boost
flat_set< int > fs; fs.reserve(5); fs.insert(10); auto it1 = fs.begin(); auto it2 = fs.insert(42).first; auto it3 = f2.begin(); // invalidates it1 and it2
... Invalidating it1 may be fine. Invalidating it2 is a definitely not okay.
Indeed, nits like the above convince me that taking an ordered vector and calling it a set was probably a good-intentions-driven mistake. That set-oriented mindset then influenced the set-like api and behavior too much (IMO) that in turn affected the performance... the original goal of the flat_set.
Could you elaborate? From my point of view the semantics and runtime characteristics are exactly what I would expect from a flattened binary tree(okay, personally i could do with a different, more cache friendly ordering of elements, but my use cases are special). I would find any other behaviour surprising.
On Mon, Mar 27, 2017 at 4:08 PM, Oswin Krause via Boost < boost@lists.boost.org> wrote:
Could you elaborate? From my point of view the semantics and runtime characteristics are exactly what I would expect from a flattened binary tree(okay, personally i could do with a different, more cache friendly ordering of elements, but my use cases are special). I would find any other behaviour surprising.
Sorry to jump into the thread, but I was just looking at flat_[set|map] yesterday, so it is on my mind. Is it not a better solution to create an algorithm library that operates on sorted ranges (ordered_insert, binary_search, etc)? That seems to alleviate all concerns because managing the container is left to the user. (That is the approach I am taking, but for a more specialized case.) THK http://www.keittlab.org/
On 27 March 2017 at 15:29, Tim Keitt via Boost
Is it not a better solution to create an algorithm library that operates on sorted ranges (ordered_insert, binary_search, etc)?
You're in luck, it's already done, a library a.k.a STL http://en.cppreference.com/w/cpp/algorithm.
On Mon, Mar 27, 2017 at 4:41 PM, degski via Boost
You're in luck
It did sound familiar ;-) (I'm working on the multi-dimensional generalization so unfortunately those wont do for what I need.) THK http://www.keittlab.org/
On 03/28/2017 08:08 AM, Oswin Krause via Boost wrote:
On 2017-03-27 22:27, Vladimir Batov via Boost wrote:
On 03/28/2017 03:15 AM, Steven Watanabe via Boost wrote:
AMDG
On 03/26/2017 05:03 PM, Vladimir Batov via Boost wrote:
On 03/27/2017 09:20 AM, Andrey Semashev wrote:
On Mon, Mar 27, 2017 at 12:01 AM, Vladimir Batov via Boost
> flat_set< int > fs; > fs.reserve(5); > fs.insert(10); > auto it1 = fs.begin(); > auto it2 = fs.insert(42).first; > auto it3 = f2.begin(); // invalidates it1 and it2 ... Invalidating it1 may be fine. Invalidating it2 is a definitely not okay.
Indeed, nits like the above convince me that taking an ordered vector and calling it a set was probably a good-intentions-driven mistake. That set-oriented mindset then influenced the set-like api and behavior too much (IMO) that in turn affected the performance... the original goal of the flat_set.
Could you elaborate? From my point of view the semantics and runtime characteristics are exactly what I would expect from a flattened binary tree(okay, personally i could do with a different, more cache friendly ordering of elements, but my use cases are special). I would find any other behaviour surprising.
Well, indeed. :-) Given the prominence of the "set" word in the name I do agree that flat_set behavior is what one "would expect from a binary tree"... flattened or not (that's an implementation detail IMO). My comments are heavily influenced by my current situation that I probably happened to be in due to my laziness. I needed a sorted container so I initially plugged-in std::set to get things ticking. I had many searches/traversals and flat_set promised to do that much better and it did... I just did not know I had to pay so much for the initialization. So, in the end I had to discard flat_set in favor of a sorted_vector container which still does excellent searches/traversals... but does not pretend to be a binary tree and as a result saves me heaps of run-time. I have a few more flat_set deployments and I am reviewing them all and, in fact, replacing them with sorted_vector (aware of the discussed behavioral differences). That raised a question -- if flat_set is no good for my (seemingly so standard by-the-book) use-cases, then what flat_set is good for? That's not a sneer, irony, sarcasm, criticism. That's an honest question. We have a component in Boost and I presume a lot of effort went into producing it. We want it useful and used... but I as a user do not seem to be able to find use for it. I found it disappointing and disconcerting.
We have a component in Boost and I presume a lot of effort went into producing it. We want it useful and used... but I as a user do not seem to be able to find use for it. I found it disappointing and disconcerting.
I personally use flat_map and flat_set all the time. Most of the use cases I have involve the set and map being essentially immutable once constructed, and thereafter being used for many lookups. When I build them, I do so by buffering into a std::vector, sorting the vector, and then inserting into the map or set using the ordered_unique_t range overload. This has always been fast enough and is still O(N log N) in total. The only optimization left to do is to remove the need to copy the contents into the flat_map/flat_set and instead have the map or set assume ownership of the underlying container. But this it not the bottleneck -- sorting is -- so it would be a small gain. -- chris
Hi,
Flat containers are not well suited for frequent modifications, so that is kind of expected.
Understood. The described use-case constructs the flat_set only once (but a big one and one insert at a time). So, on the surface all appeared by the book -- constructed once, traversed/searched often. Still, the observed construction overhead really stunned me.
I think the issue is that you use the wrong insert method. a single insert is O(N)in the size of the container (worst case: insert sorted from largest to smallest element), so inserting K element into an empty flat set has runtime O(K^2). You are essentially doing insertion sort on your data. flat_set has range-insert: template<typename InputIterator> void insert(InputIterator, InputIterator); template<typename InputIterator> void insert(ordered_unique_range_t, InputIterator, InputIterator); inserting a range of size K in a flat_set of size N should have complexity O(K log(K) + K+N) in the first case (implementable as: copy all new elements to the end, sort the new elements and perform an inplace-merge). in the second call, only a simple merge is needed, runtime O(K+N). Best, Oswin
Oswin Krause wrote: ...
I think the issue is that you use the wrong insert method. a single insert is O(N)...
flat_set has range-insert: ... inserting a range of size K in a flat_set of size N should have complexity O(K log(K) + K+N) in the first case (implementable as: copy all new elements to the end, sort the new elements and perform an inplace-merge).
In the use case the elements may come one by one. To use range insert, you'll have to buffer them until lookup is needed, then insert them at once. This is exactly what Vladimir has been suggesting that flat_set ought to do itself. Insert elements at the end, when lookup is needed, merge them into place. There is another variation of this approach, one I'm sure we've discussed before: the flat_set keeps the unsorted range at the end and on lookup tries both, first the unsorted part, then the sorted one. The unsorted part is merged into place when it exceeds some size. This doesn't do non-const operations on logically const lookups, but iteration becomes problematic if iterators have to give you a sorted view. (Although you could keep the suffix part sorted as well and merge on demand in iter::op++ if so inclined.)
Hi,
There is another variation of this approach, one I'm sure we've discussed before: the flat_set keeps the unsorted range at the end and on lookup tries both, first the unsorted part, then the sorted one. The unsorted part is merged into place when it exceeds some size. This doesn't do non-const operations on logically const lookups, but iteration becomes problematic if iterators have to give you a sorted view. (Although you could keep the suffix part sorted as well and merge on demand in iter::op++ if so inclined.)
This would have consequences both in asymptotic run time (lookups are O(log(N)+K) where K is the maximum allowed size of the unsorted region) as well as constants (more complex code, more ifs etc). If K is independent of N, we would still have overall quadratic time of inserting a lot of elements using insert(single_element). So this does not help. What I meant with my previous comment: it is probably the easiest way just to buffer all elements to be inserted in an intermediate storage area and then insert them at once. Of course we might not want to use the current insert for this as this would require storing all new elements. Here is my approach to this: flat_set maintains an array and three iterators(set_begin, set_end and vector_end). the range [set_begin,set_end) contains all inserted elements and size() begin(), end() will only return this range. the range [set_end, vector_end) contains all newly inserted elements, however they´are still "staging" and will not show up unless the flat_set is told to. so we would add two new methods staging_insert(single_element)//pushes element to the back of the staging area sort_staging()//calls sort() on the old range. afterwards set_end=staging_end.
On 03/27/2017 06:03 AM, Oswin Krause via Boost wrote:
There is another variation of this approach, one I'm sure we've discussed before: the flat_set keeps the unsorted range at the end and on lookup tries both, first the unsorted part, then the sorted one. The unsorted part is merged into place when it exceeds some size. This doesn't do non-const operations on logically const lookups, but iteration becomes problematic if iterators have to give you a sorted view. (Although you could keep the suffix part sorted as well and merge on demand in iter::op++ if so inclined.)
This would have consequences both in asymptotic run time (lookups are O(log(N)+K) where K is the maximum allowed size of the unsorted region) as well as constants (more complex code, more ifs etc). If K is independent of N, we would still have overall quadratic time of inserting a lot of elements using insert(single_element). So this does not help.
What I meant with my previous comment: it is probably the easiest way just to buffer all elements to be inserted in an intermediate storage area and then insert them at once.
I am sorry but I have to disagree on this one. I can't possibly see it as the "easiest way". My use-case (which I do not believe is exotic by any stretch of imagination) goes around and cherry-picks the elements essentially one-by-one. So, having another container to populate feels like a huge hassle. I thought the whole idea of flat_set was to get the benefits without the hard work. :-) If I have to have such a container to begin with, I might not need flat_set at all as I'd be then transferring straight into std::vector, right?
Of course we might not want to use the current insert for this as this would require storing all new elements. Here is my approach to this:
flat_set maintains an array and three iterators(set_begin, set_end and vector_end). the range [set_begin,set_end) contains all inserted elements and size() begin(), end() will only return this range. the range [set_end, vector_end) contains all newly inserted elements, however they´are still "staging" and will not show up unless the flat_set is told to.
so we would add two new methods
staging_insert(single_element)//pushes element to the back of the staging area sort_staging()//calls sort() on the old range. afterwards set_end=staging_end.
That feels quite complex... what is wrong with my (seemingly) simple suggestion of sorting on when-needed basis?
On 26 March 2017 at 15:49, Vladimir Batov via Boost
That feels quite complex... what is wrong with my (seemingly) simple suggestion of sorting on when-needed basis?
Use one https://github.com/yruslan/flat_map that does instead. degski
On 03/27/2017 11:27 AM, degski via Boost wrote:
On 26 March 2017 at 15:49, Vladimir Batov wrote:
That feels quite complex... what is wrong with my (seemingly) simple suggestion of sorting on when-needed basis? Use one https://github.com/yruslan/flat_map that does instead.
Yes. That's the idea. Thank you for the pointer... At my place the deployment of Boost or http://github.com/whatever makes quite a difference. That (still for me) begs the question if the approach taken in boost::container::flat_set is the most practical one.
Of course we might not want to use the current insert for this as this would require storing all new elements. Here is my approach to this:
flat_set maintains an array and three iterators(set_begin, set_end and vector_end). the range [set_begin,set_end) contains all inserted elements and size() begin(), end() will only return this range. the range [set_end, vector_end) contains all newly inserted elements, however they´are still "staging" and will not show up unless the flat_set is told to.
so we would add two new methods
staging_insert(single_element)//pushes element to the back of the staging area sort_staging()//calls sort() on the old range. afterwards set_end=staging_end.
That feels quite complex... what is wrong with my (seemingly) simple suggestion of sorting on when-needed basis?
I think the solution is much less complex than a solution that requires locks and atomics. It looks more complex from the outside though, I agree. Moreover, we do not come in the situation that we have a change of the data structure in a const-environment.
El 26/03/2017 a las 5:02, Vladimir Batov via Boost escribió:
I've been using boost::container::flat_set with quite a success until recently when I profiled my app to see that 30-50% of the run-time was spent in insert() for a container of around 50,000 in size. [...]
For what it's worth, I wrote something about this very problem at https://bannalia.blogspot.com/2015/06/design-patterns-for-invariant-suspensi... The solution I like the most is the one dubbed "external invariant suspension". The idea is that the container (flat_set in this case) transfer its underlying buffer out for fast insertion and sorts it once when this is moved in back: flat_set<int> s; ... std::vector<int> buf=s.get_data(); // move, s becomes empty for(...)buf.push_back(...); // populate s.accept_data(buf); // move and sort This is maximally efficient as the data is moved rather than copied, and flat_set invariant (namely its elements are sorted) is kept all the time, so there's no need to sort on begin() etc. Just my two cents, Joaquín M López Muñoz
std::vector<int> buf=s.get_data(); // move, s becomes empty for(...)buf.push_back(...); // populate s.accept_data(buf); // move and sort
I like this idea -- but I don't think accept_data can do the sort because sort might not be enough; the contents might need to be made unique too. We could add the call to unique inside accept_data, but that pessimises the case where I know my data is unique. So I think it would need to be more like; std::vector<int> buf=move(s).get_data(); for(...) buf.push_back(...); // populate sort(buf); unique(buf); // If my contents might not be unique. s.adopt_ordered_unique_data(move(buf)); // Asserts data is in fact ordered and unique? -- chris
On 28/03/2017 22:04, Chris Glover via Boost wrote:
std::vector<int> buf=s.get_data(); // move, s becomes empty for(...)buf.push_back(...); // populate s.accept_data(buf); // move and sort
I like this idea -- but I don't think accept_data can do the sort because sort might not be enough; the contents might need to be made unique too. We could add the call to unique inside accept_data, but that pessimises the case where I know my data is unique. So I think it would need to be more like;
std::vector<int> buf=move(s).get_data(); for(...) buf.push_back(...); // populate sort(buf); unique(buf); // If my contents might not be unique. s.adopt_ordered_unique_data(move(buf)); // Asserts data is in fact ordered and unique?
We can have different "adopt" variants depending on the guarantees the user provides, just like we have different "insert" versions (ordered_unique_t, etc.). The "safe" one can assume passed vector is unordered and just sorts and calls unique if needed. Other variants might just skip sorting if the user requires so. We have asymptotically optimal in place sorting algorithms in boost.move now that can even avoid any temporary huge allocation (stable_sort needs O(N) auxiliary memory) and they can take advantage of the unused capacity()- size() memory of the vector. In any case, this does not help people that inserts one by one but wants to do lookups between insertions. In that case, a new hybrid container (like those suggested in this thread, mixing an unordered vector and an ordered one, maybe using the same buffer) is needed. But that container is not flat_map, which has strict invariants and requires ordered traversal. For people that does not care avoid traversal order, then a new container, flat_unordered_map, is needed. Something like an open addressing hash table, I guess. Best, Ion
We can have different "adopt" variants depending on the guarantees the user provides, just like we have different "insert" versions (ordered_unique_t, etc.). The "safe" one can assume passed vector is unordered and just sorts and calls unique if needed. Other variants might just skip sorting if the user requires so. Sure. For people that does not care avoid traversal order, then a new container, flat_unordered_map, is needed. Something like an open addressing hash table, I guess. I've been wanting this for some time -- enough that I should probably volunteer to write it. I know it's been discussed before and nothing has come of it. Does anyone recall why? Because it just hasn't been done? -- chris
On 29/03/2017 4:13, Chris Glover via Boost wrote:
We can have different "adopt" variants depending on the guarantees the user provides, just like we have different "insert" versions (ordered_unique_t, etc.). The "safe" one can assume passed vector is unordered and just sorts and calls unique if needed. Other variants might just skip sorting if the user requires so.
Sure.
Committed to develop new experimental functions: class flat_map/set { sequence_type extract_sequence(); void adopt_sequence(sequence_type &&seq); void adopt_sequence(ordered_unique_range_t, sequence_type &&seq); }; class flat_multimap/multiset { sequence_type extract_sequence(); void adopt_sequence(sequence_type &&seq); void adopt_sequence(ordered_range_t, sequence_type &&seq); }; Let's see if they are useful. Any feedback welcome. Ion
On 2017-04-08 00:28, Ion Gaztañaga via Boost wrote:
On 29/03/2017 4:13, Chris Glover via Boost wrote:
We can have different "adopt" variants depending on the guarantees the user provides, just like we have different "insert" versions (ordered_unique_t, etc.). The "safe" one can assume passed vector is unordered and just sorts and calls unique if needed. Other variants might just skip sorting if the user requires so.
Sure.
Committed to develop new experimental functions:
class flat_map/set { sequence_type extract_sequence(); void adopt_sequence(sequence_type &&seq); void adopt_sequence(ordered_unique_range_t, sequence_type &&seq); };
class flat_multimap/multiset { sequence_type extract_sequence(); void adopt_sequence(sequence_type &&seq); void adopt_sequence(ordered_range_t, sequence_type &&seq); };
Let's see if they are useful. Any feedback welcome.
The returned sequence should be a r value reference to reduce costs of extract, modify, adopt patterns.
Ion
_______________________________________________
AMDG On 04/07/2017 05:38 PM, Oswin Krause via Boost wrote:
On 2017-04-08 00:28, Ion Gaztañaga via Boost wrote:
class flat_multimap/multiset { sequence_type extract_sequence(); void adopt_sequence(sequence_type &&seq); void adopt_sequence(ordered_range_t, sequence_type &&seq); };
Let's see if they are useful. Any feedback welcome.
The returned sequence should be a r value reference to reduce costs of extract, modify, adopt patterns.
Returning by value is correct, because it is not not safe to modify the internal sequence in place. The performance difference should be insignificant, since it moves instead of copying. In Christ, Steven Watanabe
On Sat, 8 Apr 2017, Ion Gaztañaga via Boost wrote:
On 29/03/2017 4:13, Chris Glover via Boost wrote:
We can have different "adopt" variants depending on the guarantees the user provides, just like we have different "insert" versions (ordered_unique_t, etc.). The "safe" one can assume passed vector is unordered and just sorts and calls unique if needed. Other variants might just skip sorting if the user requires so.
Sure.
Committed to develop new experimental functions:
class flat_map/set { sequence_type extract_sequence(); void adopt_sequence(sequence_type &&seq); void adopt_sequence(ordered_unique_range_t, sequence_type &&seq); };
class flat_multimap/multiset { sequence_type extract_sequence(); void adopt_sequence(sequence_type &&seq); void adopt_sequence(ordered_range_t, sequence_type &&seq); };
Let's see if they are useful. Any feedback welcome.
That seems more complicated than simply giving (in-place) access to the internal sequence_type. You already had to document the invariants for adopt_sequence(ordered_unique_range_t,*) and more or less promised that extract_sequence takes constant time, so the abstraction doesn't buy you much. -- Marc Glisse
On 08/04/2017 9:34, Marc Glisse via Boost wrote:
That seems more complicated than simply giving (in-place) access to the internal sequence_type. You already had to document the invariants for adopt_sequence(ordered_unique_range_t,*) and more or less promised that extract_sequence takes constant time, so the abstraction doesn't buy you much.
adopt/extract maintain class invariants which can avoid a lot of headaches. Even adopt_sequence(ordered[_unique[_range_t, ...) checks preconditions (ordered [and unique]) in debug mode. Could you please elaborate what does "giving access to the internal sequence_type" mean? Something like: class flat_xxx { sequence_type &sequence_ref() { return m_seq; } }; ? I'm open to improvements/new functions ;-) Ion
On Sat, 8 Apr 2017, Ion Gaztañaga via Boost wrote:
On 08/04/2017 9:34, Marc Glisse via Boost wrote:
That seems more complicated than simply giving (in-place) access to the internal sequence_type. You already had to document the invariants for adopt_sequence(ordered_unique_range_t,*) and more or less promised that extract_sequence takes constant time, so the abstraction doesn't buy you much.
adopt/extract maintain class invariants which can avoid a lot of headaches. Even adopt_sequence(ordered[_unique[_range_t, ...) checks preconditions (ordered [and unique]) in debug mode.
The usual safety vs power. I tend to consider that debug mode should not influence design (much), but I understand your wish to maintain invariants and isolate the points where "unsafe" operations can be performed, and in this case the cost (2 vector moves) is not so bad.
Could you please elaborate what does "giving access to the internal sequence_type" mean? Something like:
class flat_xxx { sequence_type &sequence_ref() { return m_seq; } };
?
Yes. -- Marc Glisse
El 08/04/2017 a las 0:28, Ion Gaztañaga via Boost escribió:
Committed to develop new experimental functions:
class flat_map/set { sequence_type extract_sequence(); void adopt_sequence(sequence_type &&seq); void adopt_sequence(ordered_unique_range_t, sequence_type &&seq); };
Shouldn't you also provide void adopt_sequence(unique_range_t, sequence_type &&seq); void adopt_sequence(ordered_range_t, sequence_type &&seq); for flat_map/set? Joaquín M López Muñoz
On 08/04/2017 10:54, Joaquin M López Muñoz via Boost wrote:
El 08/04/2017 a las 0:28, Ion Gaztañaga via Boost escribió:
Committed to develop new experimental functions:
class flat_map/set { sequence_type extract_sequence(); void adopt_sequence(sequence_type &&seq); void adopt_sequence(ordered_unique_range_t, sequence_type &&seq); };
Shouldn't you also provide
void adopt_sequence(unique_range_t, sequence_type &&seq); void adopt_sequence(ordered_range_t, sequence_type &&seq);
for flat_map/set?
It looks a bit asymmetric against multi[set/map] containers which only require ordered_unique (their own invariant). It's realistic that a user that orders the sequence externally does not want to / can't erase duplicates before calling the "risky" version of adopt_sequence? I've modeled "adopt_sequence" constructors and "insert" members taking unique_ranges. In those cases flat_set/map only supports ordered_unique_range_t. We could add a new overload for all of them but I just want to make sure they are needed. Best, Ion
El 08/04/2017 a las 12:59, Ion Gaztañaga via Boost escribió:
On 08/04/2017 10:54, Joaquin M López Muñoz via Boost wrote:
Shouldn't you also provide
void adopt_sequence(unique_range_t, sequence_type &&seq); void adopt_sequence(ordered_range_t, sequence_type &&seq);
for flat_map/set?
It looks a bit asymmetric against multi[set/map] containers which only require ordered_unique (their own invariant). It's realistic that a user that orders the sequence externally does not want to / can't erase duplicates before calling the "risky" version of adopt_sequence?
The situation looks perfectly logical to me: multi[set/map] keeps one invariant, namely sortedness, whereas set/map keeps two, sortedness and uniqueness, and it is in principle thinkable that data be given back to the container with none, one or both invariants preserved, hence the four different overloads: * adopt_sequence(seq) --> sort and eliminate duplicates * adopt_sequence(unique_range_t,seq) --> sort * adopt_sequence(ordered_range_t,seq) --> eliminate duplicates * adopt_sequence(ordered_unique_range_t,seq) --> do nothing One could potentially collapse these in two by neglecting to use the info that one of the two invariants is preserved, but the solution would not be optimal performancewise. Joaquín M López Muñoz
participants (11)
-
Andrey Semashev
-
Chris Glover
-
degski
-
Ion Gaztañaga
-
Joaquin M López Muñoz
-
Marc Glisse
-
Oswin Krause
-
Peter Dimov
-
Steven Watanabe
-
Tim Keitt
-
Vladimir Batov