Re: [boost] [Container] interest in a flat_unordered_map

18 Sep 2015

      On 18/09/2015 22:09, Treb Connell wrote:
...
I'm hoping to port a flat hash map that I've written to boost.  Since boost
has a flat_map I think it'd be good to have a flat_unordered_map as well,
especially since there's an opportunity to be much more efficient.  The one
I would like to use is based on the hash map used in Lua, which is a mix of
Coalesced and Robin Hood hashes.  I don't think there's a name that
describes it yet unfortunately, but I describe it near the end of this post.
Definitely interested. These days I was trying to think about an 
unordered flat container design for Boost.Container: if it should be 
open addressed (like google hash implementations) or not. I was trying 
to use a modified separate chaining implementation specially due to the 
low load factors that open addressing implementations need.

The easiest implementation would be to put all preallocated elements of 
the array in a free node list. That would not help cache efficiency, but 
at least it would be easy to implement, maintain and it would support 
near 1.0f load factors. Typically this would require 2 additional 
pointers plus the hash value stored per node (one for the bucket, one to 
form the singly linked list of elements) plus alignment, but with 
modifications some space could be saved. Surely cache efficiency could 
be improved with additional tricks. It would support the standard 
unordered interface including local buckets.

Joaquín's novel structure could be also "flattened":

http://bannalia.blogspot.com.es/2014/01/a-better-hash-table.html
...
*Performance (primary reason for use):*
It uses memory and cache very effectively and ends up being *very fast*.
It's hard to benchmark real world situations, but in inserting, finding,
erasing, and iterating over a dictionary it was up to six orders of
magnitude faster than VS2013's unordered_map.  This isn't a totally fair
test because the cache efficiency really shines.
Good!
...
*Standards compliance:*
Differences from std::unordered_map are widely the same differences as
flat_map listed here
<http://www.boost.org/doc/libs/1_59_0/doc/html/container/non_standard_containers.html#container.non_standard_containers.flat_xxx>.
Also bucket related methods no longer apply, so will be removed.  *That
said there's one major standards compliance issue that I'm struggling with:*
Why bucket related methods don't apply? How do you handle equal values 
in a flat_unordered_multimap container?
...
The most optimal implementation of iterators for this container *does not
have constant time traversal*.  The implementation is a vector with
elements scattered across it.  To increment an iterator I continue to
increment the internal vector iterator until I hit a new element.  In
practice elements should be scattered evenly, and will be close to one
another, but in the worst case this is N-1.  Additional memory or logic
could be added to make this O(1), but I don't think that trade is worth
it.  In my applications of unordered_map I rarely iterate, and would not
want to penalize insert, find, or erase to make iteration faster.
Not a big problem IMHO, the main use of a hash map is search and 
insertion, not iteration. It's really a pity O(1) iteration and erasure 
returning an iterator is required by the standard. It severely limits 
implementation alternatives. Not sure if flat_unordered should respect 
that (after all, flat_xx does not guarantee iterator validity or 
insertion complexities)
...
*Implementation details:*
You can see Lua's implementation here
<http://www.lua.org/source/5.3/ltable.c.html>.  The applicable logic is in
and below *luaH_newkey, *comments are useful.
It's primarily a Coalesced hash table, but differs in collision behavior.
When a new key collides with an old key, we check to see if the old key is
in it's "*main position"*.  The main position is the index that the key
hashes to.  If it's not in it's main position that means that it previously
collided and was displaced.  Since this is the new keys main position, we
displace again the current key, and place ourselves there.
Umm. Typical open addressing-based implementations guarantee iterator 
and stability until rehashing.
...
This drastically decreases chain length, and improves the load factor that
the hash map can optimally reach.
It also loses strong exception safety as the new element might need to 
move an old one.
...
Sorry about long post, can go into additional details upon request.
Yes, please.

Best,

Ion

Re: [boost] [Container] interest in a flat_unordered_map

Ion Gaztañaga