[container] small_flat_set and small_flat_map

Sam Kellett

15 Oct 2016 15 Oct '16

8:27 a.m.

Given that Boost.Container has a small_vector now, would a piece of work that changed flat_set and flat_map to use small_vector<0> internally be feasible? I ask because I feel like if this was the case it would be almost trivial to expose flat_tree<N> through small_flat_set<N> and small_flat_map<N> types to provide the small size optimisation for these two containers.

Show replies by date

Ion Gaztañaga

16 Oct 16 Oct

8:09 p.m.

On 15/10/2016 10:27, Sam Kellett wrote:

...

Given that Boost.Container has a small_vector now, would a piece of work that changed flat_set and flat_map to use small_vector<0> internally be feasible? I ask because I feel like if this was the case it would be almost trivial to expose flat_tree<N> through small_flat_set<N> and small_flat_map<N> types to provide the small size optimisation for these two containers.

It wouldn't be hard. Some have suggested that flat containers could be adaptors so that the underlying vector-like container could be customized. In any case, the idea of making flat containers compatible with small_vector seems interesting. Please fill a ticket with the request so that it is not forgotten. Best, Ion

Sam Kellett

19 Oct 19 Oct

5:21 p.m.

done: https://svn.boost.org/trac/boost/ticket/12536 i'm happy to take a stab at implementing this if you're happy to review it? On 16 October 2016 at 21:09, Ion Gaztañaga <igaztanaga@gmail.com> wrote:

...

On 15/10/2016 10:27, Sam Kellett wrote:

...
Given that Boost.Container has a small_vector now, would a piece of work that changed flat_set and flat_map to use small_vector<0> internally be feasible? I ask because I feel like if this was the case it would be almost trivial to expose flat_tree<N> through small_flat_set<N> and small_flat_map<N> types to provide the small size optimisation for these two containers.

It wouldn't be hard. Some have suggested that flat containers could be adaptors so that the underlying vector-like container could be customized. In any case, the idea of making flat containers compatible with small_vector seems interesting. Please fill a ticket with the request so that it is not forgotten.

Best,

Ion

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman /listinfo.cgi/boost

Chris Glover

5:51 p.m.

On Wed, 19 Oct 2016 at 13:21 Sam Kellett <samkellett@gmail.com> wrote:

...

done: https://svn.boost.org/trac/boost/ticket/12536

i'm happy to take a stab at implementing this if you're happy to review it?

While I hesitate to assume I know the best solution to anyone else's particular problem, I have to question the real usefulness of this in terms of flat_set and flat_map. Typically, such a small space optimisation is applicable when 90% of one's use cases fall within the small size. For strings and vectors, this makes sense because there could be anywhere from a few items, to many items. But, typically set or map is chosen precisely because there will be many items in the container, otherwise you would choose something else, like an unsorted array like structure. So, to me the case of optimising for the small size makes little sense. -- chris

Andrey Semashev

6 p.m.

On 10/19/16 20:51, Chris Glover wrote:

...

But, typically set or map is chosen precisely because there will be many items in the container, otherwise you would choose something else, like an unsorted array like structure. So, to me the case of optimising for the small size makes little sense.

I would say that it's likely the opposite - larger flat containers will have more expensive insert and erase. Flat containers are especially useful with small number of (small) elements because they are cache-friendly, and small size optimization makes perfect sense to improve that use case even further.

Chris Glover

6:42 p.m.

...

I would say that it's likely the opposite - larger flat containers will have more expensive insert and erase.

That's true, but often the data is relatively stable, in which case the flat containers are a good choice.

...

Flat containers are especially useful with small number of (small) elements because they are cache-friendly, and small size optimization makes perfect sense to improve that use case even further.

It is my experience that the flat containers *always* outperform the node based containers on searching. So when the data is stable, I always use a flat container regardless of how many elements are in the container. One of the problems with that approach is that, because insert is slow, I often have to build up a std::vector of items, sort it and then copy them into the flat container using the ordered_unique_t overload. If this change was done in a way that allowed me to pass ownership of that sorted vector to the flat_* container, that would be an additional optimisation I would find useful. -- chris

Sam Kellett

7:34 p.m.

On 19 October 2016 at 19:42, Chris Glover <c.d.glover@gmail.com> wrote:

...

[snip]

...

Flat containers are especially

...
useful with small number of (small) elements because they are cache-friendly, and small size optimization makes perfect sense to improve that use case even further.

It is my experience that the flat containers *always* outperform the node based containers on searching. So when the data is stable, I always use a flat container regardless of how many elements are in the container.

One of the problems with that approach is that, because insert is slow, I often have to build up a std::vector of items, sort it and then copy them into the flat container using the ordered_unique_t overload. If this change was done in a way that allowed me to pass ownership of that sorted vector to the flat_* container, that would be an additional optimisation I would find useful.

yeah i agree, although that's a different piece of work, no? i've mentioned this before[1] and i'm pretty sure it's on the roadmap (i've seen it mentioned since). [1] http://lists.boost.org/Archives/boost/2015/09/225568.php

Seth

8:44 p.m.

On 19-10-16 19:51, Chris Glover wrote:

...

But, typically set or map is chosen precisely because there will be many items in the container, otherwise you would choose something else, like an unsorted array like structure. So, to me the case of optimising for the small size makes little sense.

This is anecdotal only. You have to ask WHY is "set or map [typically] chosen precisely because...". The main reason might well be "because the cost of node-based data structures is too high for small volumes" not because they're not a good match. Quite often I need small collections of characters (say, characters to be escaped) which are logically best expressed as a set. It's just a lot more readably to use a set than to use e.g. a your small_vector<char> and manually maintain sort order and deduplication. The only reason NOT to use a set when logically it is a set was performance. Tiny local maps are frequent occurrence too. I find myself writing linear searches across local vector<pair<Id, reference_wrapper<T const>

...

. Imagine how much better it would be with `small_flat_map<Id, reference_wrapper<T const> >`.

Summarizing: I find myself focusing on expressive code and often find myself hindered by the lack of low-cost small associative/ordered containers. My $0.02 Seth

Seth

8:46 p.m.

On 16-10-16 22:09, Ion Gaztañaga wrote:

...

Some have suggested that flat containers could be adaptors so that the underlying vector-like container could be customized

Measurements often show that ordering keys for lookup doesn't yield benefit at very small n (because linear prefetch/branch reduction favours linear search). I'd love to see a flat_map/flat_set that - besides being configurable for their storage strategy, also can optionally be configured to not be sorted.

3188

Age (days ago)

3192

Last active (days ago)

List overview

Download

8 comments

5 participants

participants (5)

Andrey Semashev
Chris Glover
Ion Gaztañaga
Sam Kellett
Seth