Daniel James
On 16 February 2014 17:25, Joaquin M Lopez Munoz
wrote: I think this not the story the benchmarks tell. 15 different scenarios were tested:
[...]
and results do not show any definite winner (< means "better than"):
[...] I think yours are the first benchmarks I've seen to deal with elements with equivalent keys. But what I'm not sure about, is how closely benchmarks reflect actual use.
Neither am I :-/
A minor dilemma that I've got at the moment is that on 64-bit machines the newer versions use a different bucket placement strategy to avoid the costly prime modulus. It's typically faster, but for certain cases (such as inserting consecutive integers) the old version is much faster as there are no collisions at all. The problem is I have no idea how often this happens in real world cases.
My gut feeling is that inserting more or less consecutive ranges of integers is a fairly common scenario. In order to speed modulo calculations, I do this trick: instead of having size_t hash_to_bucket(size_t h) { return h%bucket_size; } I write the following: size_t hash_to_bucket(size_t h) { switch(bucket_size_index){ case 0: return h%53; case 1: return h%97; case 2: return h%193; case 2: return h%389; ... case 59: return h%18446744073709551557ul; } } Each h%constant operation the compiler is able to heavily optimize by emulating integer division with precalculated multiplications (there's a section in Hacker's Delight on this I guess). I took a look at the generated assembly code and integer divisions were in fact gone, and the thing is considerably faster despite the huge number of cases (switching on [0,...n) can be coded very efficiently and I think CPU branch prediction plays a role here also). See http://tinyurl.com/okou28x for the actual implementation. You might want to try someting similar in Boost.Unordered. Joaquín Mª López Muñoz Telefónica Digital