unordered_map implementation
I've been helping Christian Mazakas maintain Unordered, and we've been looking into trying to improve its performance on the synthetic benchmarks we have for it (https://github.com/boostorg/unordered/tree/develop/benchmark) One of the things that are hurting us is that we need to do a double indirection in order to find an element, because Unordered uses a singly-linked list for all its nodes, with the bucket array pointing at the element before the first one (in order to be able to support erasing.) There's also the option of using a doubly-linked list, which however increases the memory footprint. I'm looking at the iterator invalidation rules for std::unordered_map, and iterators are invalidated on every rehash. This makes me wonder why is it necessary to have all the elements in a single list, as every implementation seems to do. It seems to me that it's possible to just have singly- linked lists for each bucket, and then have a "fat" iterator that contains the bucket index and the pointer to the element. On increment, when the end of the bucket list is reached, the iterator can just go to the next non-empty bucket. Is there something I'm missing in the requirements that renders this implementation non-conforming?
Em 19 de fev de 2022, à(s) 19:23, Peter Dimov via Boost
escreveu: I've been helping Christian Mazakas maintain Unordered, and we've been looking into trying to improve its performance on the synthetic benchmarks we have for it
(https://github.com/boostorg/unordered/tree/develop/benchmark)
One of the things that are hurting us is that we need to do a double indirection in order to find an element, because Unordered uses a singly-linked list for all its nodes, with the bucket array pointing at the element before the first one (in order to be able to support erasing.)
There's also the option of using a doubly-linked list, which however increases the memory footprint.
I'm looking at the iterator invalidation rules for std::unordered_map, and iterators are invalidated on every rehash. This makes me wonder why is it necessary to have all the elements in a single list, as every implementation seems to do.
It seems to me that it's possible to just have singly- linked lists for each bucket, and then have a "fat" iterator that contains the bucket index and the pointer to the element. On increment, when the end of the bucket list is reached, the iterator can just go to the next non-empty bucket.
Is there something I'm missing in the requirements that renders this implementation non-conforming?
You may find this relevant: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2023.pdf Joaquín M López Muñoz
Joaquín M López Muñoz wrote:
It seems to me that it's possible to just have singly- linked lists for each bucket, and then have a "fat" iterator that contains the bucket index and the pointer to the element. On increment, when the end of the bucket list is reached, the iterator can just go to the next non-empty bucket.
Is there something I'm missing in the requirements that renders this implementation non-conforming?
You may find this relevant:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2023.pdf
Oh, I've rediscovered the still water. So this has been discussed extensively in LWG: http://lwg.github.io/issues/lwg-closed.html#579 and the proposal to make erase return void has been closed as NAD, because
The issue was lengthy discussed and implementation experience was demonstrated that a non-void return type is implementable for both single-linked and double-linked lists without loss of efficiency.
except it's not quite like that because traversing a bucket array is considered equally complex to traversing a linked list, but in practice there are things called caches. So we're taking quite a hit from a single extra indirection in `find` in order to make `erase` perform well. Unordered actually used to work as I suggest above, and was changed in Boost 1.48 to the current arrangement: https://www.boost.org/doc/libs/1_78_0/doc/html/unordered/changes.html#unorde... in response to a bug report https://svn.boost.org/trac10/ticket/3966 https://svn.boost.org/trac10/ticket/3693 https://lists.boost.org/Archives/boost/2009/11/159116.php that erase is O(number of buckets) worst case which causes pathological behavior when erasing in reverse order. Not a satisfactory state of affairs.
participants (2)
-
Joaquín M López Muñoz
-
Peter Dimov