[unordered] Merging unordered maps
I have a performance hotspot when merging a few unordered_maps of the same type into a master unordered_map (also of the same type) Currently I am doing Foreach( auto const& submap ,submaps ) Master.insert( submap.begin() , submap.end() ); This seems to be somewhat wasteful because the insert is (I think) computing hashes that the submaps already "know". Can I improve on this - without changing Boost? - without changing the boost interface, but changing the implementation (eg specialising the insert for certain iterator types)? - some more intrusive change? If the second or third option, would the maintainer be willing to consider merging the change into Boost proper? Thanks, Pete
On 29 May 2014 17:24, Pete bartlett
I have a performance hotspot when merging a few unordered_maps of the same type into a master unordered_map (also of the same type)
Currently I am doing
Foreach( auto const& submap ,submaps ) Master.insert( submap.begin() , submap.end() );
This seems to be somewhat wasteful because the insert is (I think) computing hashes that the submaps already "know".
Can I improve on this - without changing Boost?
Not as far as I know.
- without changing the boost interface, but changing the implementation (eg specialising the insert for certain iterator types)?
I don't think so. In your example the problem is that the iterator only points to the nodes, which happen to contain the hash values, but there's no way of telling if they were generated by the same hash function.
- some more intrusive change?
Maybe. Could write a function that inserts from another container, rather than using iterators. It is now possible to tell if the same hash function type was used, but it isn't possible to tell that they're equal (eg. when using a randomized hash function, the hash values might not be the same for two different instances). It might be possible to create a type trait to detect if the hash function has an equality operator - and assume they're not equal if it doesn't.
If the second or third option, would the maintainer be willing to consider merging the change into Boost proper?
I would have to think about it.
Daniel James wrote:
Pete Bartlett wrote:
I have a performance hotspot when merging a few unordered_maps of the same type into a master unordered_map (also of the same type)
Currently I am doing
foreach( auto const& submap ,submaps ) master.insert( submap.begin() , submap.end() );
This seems to be somewhat wasteful because the insert is (I think) computing hashes that the >>submaps already "know".
Can I improve on this - without changing Boost?
Not as far as I know.
Thank you very much for your very helpful comments.
I realise now that in my case, where the hash calculation is expensive
relative to node insertion cost, it is worthwhile doing a sort of
memoization technique: If my original key type is K and hash is h, I can
change the key type to pair
- some more intrusive change?
Maybe. Could write a function that inserts from another container, rather than using >iterators. It is now possible to tell if the same hash function type was used, but it isn't >possible to tell that they're equal (eg. when using a randomized hash function, the hash >values might not be the same for two different instances). It might be possible to create a >type trait to detect if the hash function has an equality operator - and assume they're not >equal if it doesn't.
Having said the above, I fancy a crack at this just for my own education. I'll let you know if I get anywhere. Thanks again, Pete
On 29 May 2014 22:24, Pete Bartlett
I realise now that in my case, where the hash calculation is expensive relative to node insertion cost, it is worthwhile doing a sort of memoization technique: If my original key type is K and hash is h, I can change the key type to pair
where the keys are now (k,h(k)) and then give the unordered_map an adapted hash that simply returns pair.second. I now only pay the hash calculation cost when first inserting into the submaps. As you point out below, this is only safe for non-random hashes but works for me.
Certainly, I was talking about the fully general case. You can add extra requirements in your own code.
- some more intrusive change?
Maybe. Could write a function that inserts from another container, rather than using >iterators. It is now possible to tell if the same hash function type was used, but it isn't >possible to tell that they're equal (eg. when using a randomized hash function, the hash >values might not be the same for two different instances). It might be possible to create a >type trait to detect if the hash function has an equality operator - and assume they're not >equal if it doesn't.
Having said the above, I fancy a crack at this just for my own education. I'll let you know if I get anywhere.
To be honest, the code's more complicated than it really should be. Although this case might not be too bad. The implementation of insert_range in unique.hpp code is pretty weird - it deals with some edge cases that you don't need to think about, so you could base insertion on something like this (untested): template <class InputIt> void simpler_insert_range(InputIt i, InputIt j) { node_constructor a(this->node_alloc()); for (; i != j; ++i) { insert_range_impl2(a, this->get_key(*i), i, j); } } Hopefully the implementation of insert_range in equivalent.hpp is easier to deal with.
participants (3)
-
Daniel James
-
Pete bartlett
-
Pete Bartlett