[Container] interest in a flat_unordered_map
I'm hoping to port a flat hash map that I've written to boost. Since boost has a flat_map I think it'd be good to have a flat_unordered_map as well, especially since there's an opportunity to be much more efficient. The one I would like to use is based on the hash map used in Lua, which is a mix of Coalesced and Robin Hood hashes. I don't think there's a name that describes it yet unfortunately, but I describe it near the end of this post. *Performance (primary reason for use):* It uses memory and cache very effectively and ends up being *very fast*. It's hard to benchmark real world situations, but in inserting, finding, erasing, and iterating over a dictionary it was up to six orders of magnitude faster than VS2013's unordered_map. This isn't a totally fair test because the cache efficiency really shines. *Standards compliance:* Differences from std::unordered_map are widely the same differences as flat_map listed here http://www.boost.org/doc/libs/1_59_0/doc/html/container/non_standard_contain.... Also bucket related methods no longer apply, so will be removed. *That said there's one major standards compliance issue that I'm struggling with:* The most optimal implementation of iterators for this container *does not have constant time traversal*. The implementation is a vector with elements scattered across it. To increment an iterator I continue to increment the internal vector iterator until I hit a new element. In practice elements should be scattered evenly, and will be close to one another, but in the worst case this is N-1. Additional memory or logic could be added to make this O(1), but I don't think that trade is worth it. In my applications of unordered_map I rarely iterate, and would not want to penalize insert, find, or erase to make iteration faster. I could make a new concept for this, pay the cost for making it truly O(1), or *since the common case is not O(N) would it be okay to consider these bidirectional iterators*? *Implementation details:* You can see Lua's implementation here http://www.lua.org/source/5.3/ltable.c.html. The applicable logic is in and below *luaH_newkey, *comments are useful. It's primarily a Coalesced hash table, but differs in collision behavior. When a new key collides with an old key, we check to see if the old key is in it's "*main position"*. The main position is the index that the key hashes to. If it's not in it's main position that means that it previously collided and was displaced. Since this is the new keys main position, we displace again the current key, and place ourselves there. This drastically decreases chain length, and improves the load factor that the hash map can optimally reach. Sorry about long post, can go into additional details upon request. Thanks, Treb
On Fri, Sep 18, 2015 at 4:09 PM, Treb Connell
I'm hoping to port a flat hash map that I've written to boost. Since boost has a flat_map I think it'd be good to have a flat_unordered_map as well, especially since there's an opportunity to be much more efficient. The one I would like to use is based on the hash map used in Lua, which is a mix of Coalesced and Robin Hood hashes. I don't think there's a name that describes it yet unfortunately, but I describe it near the end of this post.
I would welcome such a data structure into boost. I just wonder if flat_unordererd_map is a good name.
*Performance (primary reason for use):* It uses memory and cache very effectively and ends up being *very fast*. It's hard to benchmark real world situations, but in inserting, finding, erasing, and iterating over a dictionary it was up to six orders of magnitude faster than VS2013's unordered_map. This isn't a totally fair test because the cache efficiency really shines.
*Standards compliance:* Differences from std::unordered_map are widely the same differences as flat_map listed here http://www.boost.org/doc/libs/1_59_0/doc/html/container/non_standard_contain.... Also bucket related methods no longer apply, so will be removed. *That said there's one major standards compliance issue that I'm struggling with:*
The most optimal implementation of iterators for this container *does not have constant time traversal*. The implementation is a vector with elements scattered across it. To increment an iterator I continue to increment the internal vector iterator until I hit a new element. In practice elements should be scattered evenly, and will be close to one another, but in the worst case this is N-1. Additional memory or logic could be added to make this O(1), but I don't think that trade is worth it. In my applications of unordered_map I rarely iterate, and would not want to penalize insert, find, or erase to make iteration faster.
I could make a new concept for this, pay the cost for making it truly O(1), or *since the common case is not O(N) would it be okay to consider these bidirectional iterators*?
How about letting the user decide and making the iterator behavior choice a template boolean argument, and select the appropriate implementation based on that (e.g like many container options are done in Boost.Intrusive)? <snip> -- François
On Fri, Sep 18, 2015 at 2:24 PM, Francois Duranleau < xiao.bai.xiong@gmail.com> wrote:
On Fri, Sep 18, 2015 at 4:09 PM, Treb Connell
wrote: I could make a new concept for this, pay the cost for making it truly O(1), or *since the common case is not O(N) would it be okay to consider these bidirectional iterators*?
How about letting the user decide and making the iterator behavior choice a template boolean argument, and select the appropriate implementation based on that (e.g like many container options are done in Boost.Intrusive)?
Leaving the option up to the user is nice, but I agree that users are unlikely to want to pay a performance penalty just to avoid iteration being slow in the case that the capacity is much larger than the size. Users are likely to be careful already how they use such a container given that they have specifically chosen to use a flat_ container rather than the default. Making it a compile-time option might make the code significantly more complex and also hurt compile times. Unrelated to the iterator issue, since there are various trade-offs to the different open addressing/flat hashing approaches, it might be better to give the container a more specific name, or ideally if you have the energy, implement multiple approaches. I'd say just document it as bidirectional iterators but explain how they work/the performance considerations. It is already well-established that the standard iterator categories are not defined in a sufficiently flexible way. Big-O notation is also pretty meaningless when applied to actual software.
On 18/09/2015 22:09, Treb Connell wrote:
I'm hoping to port a flat hash map that I've written to boost. Since boost has a flat_map I think it'd be good to have a flat_unordered_map as well, especially since there's an opportunity to be much more efficient. The one I would like to use is based on the hash map used in Lua, which is a mix of Coalesced and Robin Hood hashes. I don't think there's a name that describes it yet unfortunately, but I describe it near the end of this post.
Definitely interested. These days I was trying to think about an unordered flat container design for Boost.Container: if it should be open addressed (like google hash implementations) or not. I was trying to use a modified separate chaining implementation specially due to the low load factors that open addressing implementations need. The easiest implementation would be to put all preallocated elements of the array in a free node list. That would not help cache efficiency, but at least it would be easy to implement, maintain and it would support near 1.0f load factors. Typically this would require 2 additional pointers plus the hash value stored per node (one for the bucket, one to form the singly linked list of elements) plus alignment, but with modifications some space could be saved. Surely cache efficiency could be improved with additional tricks. It would support the standard unordered interface including local buckets. Joaquín's novel structure could be also "flattened": http://bannalia.blogspot.com.es/2014/01/a-better-hash-table.html
*Performance (primary reason for use):* It uses memory and cache very effectively and ends up being *very fast*. It's hard to benchmark real world situations, but in inserting, finding, erasing, and iterating over a dictionary it was up to six orders of magnitude faster than VS2013's unordered_map. This isn't a totally fair test because the cache efficiency really shines.
Good!
*Standards compliance:* Differences from std::unordered_map are widely the same differences as flat_map listed here http://www.boost.org/doc/libs/1_59_0/doc/html/container/non_standard_contain.... Also bucket related methods no longer apply, so will be removed. *That said there's one major standards compliance issue that I'm struggling with:*
Why bucket related methods don't apply? How do you handle equal values in a flat_unordered_multimap container?
The most optimal implementation of iterators for this container *does not have constant time traversal*. The implementation is a vector with elements scattered across it. To increment an iterator I continue to increment the internal vector iterator until I hit a new element. In practice elements should be scattered evenly, and will be close to one another, but in the worst case this is N-1. Additional memory or logic could be added to make this O(1), but I don't think that trade is worth it. In my applications of unordered_map I rarely iterate, and would not want to penalize insert, find, or erase to make iteration faster.
Not a big problem IMHO, the main use of a hash map is search and insertion, not iteration. It's really a pity O(1) iteration and erasure returning an iterator is required by the standard. It severely limits implementation alternatives. Not sure if flat_unordered should respect that (after all, flat_xx does not guarantee iterator validity or insertion complexities)
*Implementation details:* You can see Lua's implementation here http://www.lua.org/source/5.3/ltable.c.html. The applicable logic is in and below *luaH_newkey, *comments are useful.
It's primarily a Coalesced hash table, but differs in collision behavior. When a new key collides with an old key, we check to see if the old key is in it's "*main position"*. The main position is the index that the key hashes to. If it's not in it's main position that means that it previously collided and was displaced. Since this is the new keys main position, we displace again the current key, and place ourselves there.
Umm. Typical open addressing-based implementations guarantee iterator and stability until rehashing.
This drastically decreases chain length, and improves the load factor that the hash map can optimally reach.
It also loses strong exception safety as the new element might need to move an old one.
Sorry about long post, can go into additional details upon request.
Yes, please. Best, Ion
Definitely interested. These days I was trying to think about an unordered flat container design for Boost.Container: if it should be open addressed (like google hash implementations) or not. I was trying to use a modified separate chaining implementation specially due to the low load factors that open addressing implementations need.
Other implementations would likely be a good idea even if we do pursue this. I originally built this for game code that didn't use exceptions and didn't modify the container while iterating, so it was a very good fit. Of course being in a standard like Boost the iteration stability, exception guarantee, implementation difficulty are pretty big problems with this container. I definitely think c++ users need a faster standard hash map available to them, but I imagine we'll have to think more about exactly which implementation is best. Like Jeremy points out it might be best to give this a very specific name and then include other types of flat unordered maps as well. If you want to consider this I can do additional bench-marking to see if it can make up in speed what it lacks in other ways. Why bucket related methods don't apply? How do you handle equal values
in a flat_unordered_multimap container?
Sorry, I guess bucket methods can be made to apply. I don't typically think of open addressing schemes as using buckets, but they do enough to implement those methods.
Umm. Typical open addressing-based implementations guarantee iterator and stability until rehashing.
Yes, that's a downside of this implementation. Iterator's are invalidated on every insert and erase. The benefit to this is more speed. It also loses strong exception safety as the new element might need to
move an old one.
Another downside with this implementation. In the end there's no way to make it strong when you're moving existing values. More details: This has an overhead of one pointer per slot. In my implementation I do not save the hash and recompute it, but saving it would add the hash per slot as well. The pointer is for a singly-circularly linked list between collided keys. It is circular so that we can compute previous by traversing the list until we find ourselves again. The container itself stores begin, end, and a free pointer. The free pointer is an optimization to help us find free slots. It points to the 'last free spot' then we increment up from there to try to find a new free spot. This packs elements close together. There are other ways we could do this, such as have a free list between them using the pointer that's already there. Then we'd have to add a bool to each slot to know if they're free because the container iterator has to jump over free nodes. When we erase we may also have to move a value, if we erase a node at the main position, and it has a chain, then we must move a node in that chain into the main position. I try to insert values that collide near each other by checking the two adjacent slots before going to the free pointer. I also was able to reduce iterator size to a single pointer by have a dummy pointer at the end of the container that acts as a guard. It looks like a non-free slot, so the iterator stops on it and keeps us from having to store or check an end pointer. I think those are all the implementation details that matter. -Treb
On Fri, Sep 18, 2015 at 11:09 PM, Treb Connell
I'm hoping to port a flat hash map that I've written to boost.
*Performance (primary reason for use):* It uses memory and cache very effectively and ends up being *very fast*. It's hard to benchmark real world situations, but in inserting, finding, erasing, and iterating over a dictionary it was up to six orders of magnitude faster than VS2013's unordered_map. This isn't a totally fair test because the cache efficiency really shines.
I believe, much of the benefit depends on the nature of keys and values (e.g. ints vs. strings)? Is there a more detailed description of the tests and results, at different container sizes?
The most optimal implementation of iterators for this container *does not have constant time traversal*. The implementation is a vector with elements scattered across it. To increment an iterator I continue to increment the internal vector iterator until I hit a new element. In practice elements should be scattered evenly, and will be close to one another, but in the worst case this is N-1. Additional memory or logic could be added to make this O(1), but I don't think that trade is worth it. In my applications of unordered_map I rarely iterate, and would not want to penalize insert, find, or erase to make iteration faster.
Can the iterator tradeoff be made a compile time option? One frequently used operation that generally amounts to iterating through elements is clear(). Do I understand correctly that its complexity can be O(N) where N>size()? I believe the same applies to range erase(). Also, is size() constant time?
I could make a new concept for this, pay the cost for making it truly O(1), or *since the common case is not O(N) would it be okay to consider these bidirectional iterators*?
I think the iterators should have bidirectional traversal category regardless of the choice on the complexity. It's the closest standard category that fits anyway.
Funny, I just came across this. Nearly 2 years ago I mentioned to Ion that I had a flat unordered set/map/multiset/multimap containers and was under the impression he wasn't interested. I have a working (albeit a bit old, looking at it now there's a few spots I could clean things up, but it works) implementation here: https://sourceforge.net/projects/unorderedflatse/files/?source=navbar, that I've used for some time now. I've found them to be wonderful, and are one of my most used containers. I used open addressing with linear probing, so if you need an implementation to compare Treb Connell's against perhaps this will come in handy.
Date: Sat, 19 Sep 2015 03:31:12 +0300 From: andrey.semashev@gmail.com To: boost@lists.boost.org Subject: Re: [boost] [Container] interest in a flat_unordered_map
On Fri, Sep 18, 2015 at 11:09 PM, Treb Connell
wrote: I'm hoping to port a flat hash map that I've written to boost.
*Performance (primary reason for use):* It uses memory and cache very effectively and ends up being *very fast*. It's hard to benchmark real world situations, but in inserting, finding, erasing, and iterating over a dictionary it was up to six orders of magnitude faster than VS2013's unordered_map. This isn't a totally fair test because the cache efficiency really shines.
I believe, much of the benefit depends on the nature of keys and values (e.g. ints vs. strings)? Is there a more detailed description of the tests and results, at different container sizes?
The most optimal implementation of iterators for this container *does not have constant time traversal*. The implementation is a vector with elements scattered across it. To increment an iterator I continue to increment the internal vector iterator until I hit a new element. In practice elements should be scattered evenly, and will be close to one another, but in the worst case this is N-1. Additional memory or logic could be added to make this O(1), but I don't think that trade is worth it. In my applications of unordered_map I rarely iterate, and would not want to penalize insert, find, or erase to make iteration faster.
Can the iterator tradeoff be made a compile time option?
One frequently used operation that generally amounts to iterating through elements is clear(). Do I understand correctly that its complexity can be O(N) where N>size()? I believe the same applies to range erase().
Also, is size() constant time?
I could make a new concept for this, pay the cost for making it truly O(1), or *since the common case is not O(N) would it be okay to consider these bidirectional iterators*?
I think the iterators should have bidirectional traversal category regardless of the choice on the complexity. It's the closest standard category that fits anyway.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
I believe, much of the benefit depends on the nature of keys and values (e.g. ints vs. strings)? Is there a more detailed description of the tests and results, at different container sizes?
I'm planning to port this enough to be able to run the Boost.Container perf tests. After I do that I'll post numbers from there. Wanted to get an initial sense of interest before starting work.
Can the iterator tradeoff be made a compile time option?
One frequently used operation that generally amounts to iterating through elements is clear(). Do I understand correctly that its complexity can be O(N) where N>size()? I believe the same applies to range erase().
Also, is size() constant time?
size() is constant time, which I pay for in memory and a bit of logic to keep track of that. clear() is O(N) where N is bucket_count, though no work is done on buckets with no elements .
participants (6)
-
Andrey Semashev
-
Francois Duranleau
-
Ion Gaztañaga
-
Jeremy Maitin-Shepard
-
Ryan Boghean
-
Treb Connell