On 2 Dec 2014 at 23:26, Stephan T. Lavavej wrote:
I don't see why eg. std::hash
(or any smaller type) doesn't just return the bitwise equivalent of the original value by default, as this has zero collision probability and the highest performance. Hashes are typically masked to get bucket indices, so hashing integers with the identity function allows collisions to be trivially generated, either intentionally or unintentionally. For example, 0x1000, 0x2000, 0x3000, 0x4000 are highly likely to collide, if the lower bits are being used to index buckets.
There are other big problems with trivial hashes too in that certain unfortunately common choices of bucket size happen to generate resonances with unfortunately common trivial hashes. I was tempted in concurrent_unordered_map to detect trivial hashes and deliberately subvert known terrible combinations of bucket size and trivial hashes by twiddling the bucket count away from what was requested, but then I remembered I am not a statistician. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/