[hash] Extract module from functional + std::hash_combine

older
[date_time] Eliminate pre-1.33.0...

Daniel James

19 Dec 2017 19 Dec '17

1:31 p.m.

Hi, I'm planning to extract hash from functional into its own module, as that will apparently break the circular dependencies that functional is part of. Any objections? Also, any opinions on what the module should be called? 'hash' might suggest a more general purpose hash function, so maybe it should be called something more specific. Daniel

Show replies by date

Richard Hodges

19 Dec 19 Dec

2:12 p.m.

IMHO boost::hash is perfect, in every respect. Follow that (including all the missing functionality in std::hash - particularly ADL-friendly hash_value()) and you won't go wrong. #include <hash> is fine by me. If we ever want any other kind of hash, surely is could go into #include <md5> or #include <sha256> etc. until std::hash >= boost::hash, it's not useful. In addition, I would propose (I am sure I'm not the first!): namespace std { template<class T = void> struct hash { ... }; template<> struct hash<void> { template<class T> auto operator()(T&& arg) const { return hash_value(arg); // ADL to the rescue } }; } and then: template<class K, class V, class H = std::hash<>, class E = std::equal<>> struct unordered_map; // std::equal<V1 = void, V2 = void> should of course compare any 2 comparable things Which finally makes unordered_map declarations sane. Whoever decided to downgrade boost hash before solidifying its uselessness in the c++11 standard? Anticipating: but but, backward compatibility... Answer: nostalgia is the death of innovation. Look forward my son... On 19 December 2017 at 14:31, Daniel James via Boost <boost@lists.boost.org> wrote:

...

Hi,

I'm planning to extract hash from functional into its own module, as that will apparently break the circular dependencies that functional is part of. Any objections? Also, any opinions on what the module should be called? 'hash' might suggest a more general purpose hash function, so maybe it should be called something more specific.

Daniel

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost

Daniel James

2:51 p.m.

On 19 December 2017 at 14:12, Richard Hodges via Boost <boost@lists.boost.org> wrote:

...

Whoever decided to downgrade boost hash before solidifying its uselessness in the c++11 standard?

The C++11 std::hash was specified before boost::hash was created, so it wasn't downgraded. There have been several proposals for improving std::hash, some of which do have advantages over `boost::hash`, but the standards committee hasn't settled on any of them. But it's possible that the next standard will include 'hash_combine', as well as functions using variadic arguments: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3876.pdf This is a bit of a compromise that makes it easier to implement custom hash functions but doesn't prevent any of the other proposal from going forward. I was thinking about implementing the variadic functions in that paper for 1.67.0, but I think I'll wait to see if it's added to the standard.

Richard Hodges

3:23 p.m.

...

The C++11 std::hash was specified before boost::hash was created, so...

source : https://github.com/boostorg/functional/blob/develop/include/boost/functional... // Copyright 2005-2014 Daniel James. // Distributed under the Boost Software License, Version 1.0. (See accompanying // file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) You're saying that the c++11 standard codified std::hash in 2005? This seems to me to be an extraordinary claim. More likely is that std::hash is the retarded cousin of boost::hash, and the c++11 committee's gravest error. On 19 December 2017 at 15:51, Daniel James via Boost <boost@lists.boost.org> wrote:

...

On 19 December 2017 at 14:12, Richard Hodges via Boost <boost@lists.boost.org> wrote:

...
Whoever decided to downgrade boost hash before solidifying its

uselessness

...
in the c++11 standard?

The C++11 std::hash was specified before boost::hash was created, so it wasn't downgraded. There have been several proposals for improving std::hash, some of which do have advantages over `boost::hash`, but the standards committee hasn't settled on any of them. But it's possible that the next standard will include 'hash_combine', as well as functions using variadic arguments:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3876.pdf

This is a bit of a compromise that makes it easier to implement custom hash functions but doesn't prevent any of the other proposal from going forward. I was thinking about implementing the variadic functions in that paper for 1.67.0, but I think I'll wait to see if it's added to the standard.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost

Steven Watanabe

3:36 p.m.

AMDG On 12/19/2017 08:23 AM, Richard Hodges via Boost wrote:

...

...
The C++11 std::hash was specified before boost::hash was created, so...

source : https://github.com/boostorg/functional/blob/develop/include/boost/functional...

// Copyright 2005-2014 Daniel James. // Distributed under the Boost Software License, Version 1.0. (See accompanying // file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

You're saying that the c++11 standard codified std::hash in 2005? This seems to me to be an extraordinary claim.

hash was part of TR1

...

More likely is that std::hash is the retarded cousin of boost::hash, and the c++11 committee's gravest error.

In Christ, Steven Watanabe

Richard Hodges

3:59 p.m.

...

hash was part of TR1

TR1 was circulated in 2005. std::hash had 6 years to up its game before 2011. Boost managed it. I have read google's proposal on improving std::hash. It's not as if the committee is unaware of the general uselessness of std::hash. Unless std::hash is raised to the level of competency enjoyed by boost hash, there is no point do any further work on it. It will remain un-useable. It is simply laughably inconvenient to have to inject hash specialisations into the std namespace. Please don't waste your valuable time. Just adopt boost::hash into std. On 19 December 2017 at 16:36, Steven Watanabe via Boost < boost@lists.boost.org> wrote:

...

AMDG

On 12/19/2017 08:23 AM, Richard Hodges via Boost wrote:

...
...
The C++11 std::hash was specified before boost::hash was created, so...

source : https://github.com/boostorg/functional/blob/develop/ include/boost/functional/hash/hash.hpp

// Copyright 2005-2014 Daniel James. // Distributed under the Boost Software License, Version 1.0. (See accompanying // file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)

You're saying that the c++11 standard codified std::hash in 2005? This seems to me to be an extraordinary claim.

hash was part of TR1

...
More likely is that std::hash is the retarded cousin of boost::hash, and the c++11 committee's gravest error.

In Christ, Steven Watanabe

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost

Peter Dimov

4:14 p.m.

Daniel James wrote:

...

But it's possible that the next standard will include 'hash_combine', as well as functions using variadic arguments:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3876.pdf

Nico Josuttis has been pushing for std::hash_combine - his latest is http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0814r0.pdf and I vaguely remember it being accepted by LEWG at the last meeting, but at the moment I'm not convinced that this is the right way forward. The alternatives proposed by Howard Hinnant and by Google are better. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0029r0.html http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html The problem with our hash_combine is twofold. First, it doesn't allow intermediate state bigger than size_t. Second, it necessitates that the value of the object is collapsed into a single size_t before being mixed into the intermediate state (which is also the final state in our implementation, but it doesn't need to be.) So if we have struct X { Y y; Z z; }; struct Y { U u; V v; }; and we evaluate hash<X>()(x), we get size_t state = 0; size_t v1 = hash_value(x.y); size_t st2 = 0; size_t w1 = hash_value(x.y.u); hash_combine(st2,w1); size_t w2 = hash_value(x.y.v); hash_combine(st2,w2); return st2; hash_combine(state, v1); size_t v2 = hash_value(x.z); hash_combine(state, v2); return state; // or f(state) whereas we'd rather like the following: State state{}; // hash algorithm dependent hash_combine(state, x.y); hash_combine(state, x.y.u); hash_combine(state, x.y.v); hash_combine(state, x.z); return f(state); The interesting thing is that our hash_combine signature is actually compatible with the second approach; we just need to make the state (currently fixed to size_t) a template parameter, and instruct people to write their custom overloads as: template<class State> void hash_combine(State& state, X const& x) { hash_combine(state, x.y); hash_combine(state, x.z); } instead of providing hash_value. Hash algorithms, in turn, will (may(*)) provide overloads of the form void hash_combine(md5& state, int const& x) { // mix x into state } and the hash<> object will create an instance of the algorithm, call hash_combine, and return the final state. This is basically what N3980 does, except with hash_append. (*) May, because we can provide a generic overload for 'int' that hashes it at the byte level, as is the standard nowadays.

Daniel James

4:47 p.m.

On 19 December 2017 at 16:14, Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Daniel James wrote:

...
But it's possible that the next standard will include 'hash_combine', as well as functions using variadic arguments:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3876.pdf

Nico Josuttis has been pushing for std::hash_combine - his latest is

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0814r0.pdf

Ah, I got the wrong one. The problem with this paper is that hash_combine is defined differently to ours, so if it's added to the standard, it's going to make it awkward for me to follow it.

...

and I vaguely remember it being accepted by LEWG at the last meeting, but at the moment I'm not convinced that this is the right way forward.

I heard that on Slack after the last meeting. That's why I was picking up on this paper, and not the others.

...

The alternatives proposed by Howard Hinnant and by Google are better.

IIUC the Josuttis paper is intended as a stop-gap that won't get in the way of a better proposal. But it's a bit half-baked as there are a lot of use cases it doesn't have good support for, such as hashing ranges or conditionally hashing a member.

Peter Dimov

5:01 p.m.

Daniel James wrote:

...

...
The alternatives proposed by Howard Hinnant and by Google are better.

IIUC the Josuttis paper is intended as a stop-gap that won't get in the way of a better proposal.

But it does get in the way, because the Google proposal has hash_combine with different semantics.

Howard Hinnant

5:10 p.m.

On Dec 19, 2017, at 12:01 PM, Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Daniel James wrote:

...
...
The alternatives proposed by Howard Hinnant and by Google are better.

IIUC the Josuttis paper is intended as a stop-gap that won't get in the way of a better proposal.

But it does get in the way, because the Google proposal has hash_combine with different semantics.

The name “hash_combine” was discussed in the LEWG. When I asked, I was told the functionality was the same or essentially the same as boost::hash_combine. Therefore I recommended that the standard keep the name hash_combine for this proposal as it is already well-known, and would serve as a clearer warning about what this functionality will (and will not) do. If new functionality is to be provided which allows any algorithm to be incrementally updated with data, and then finalized to compute a hash, in such a way that the integrity of the hash algorithm is not compromised simply because it isn’t fed all the data at one time, I believe we need a new name (other than hash_combine). Otherwise there will be confusion. I chose hash_append for that new name. There may be other better names. But hash_combine is already taken and should not be reused for this purpose. Howard

Peter Dimov

5:18 p.m.

Howard Hinnant wrote:

...

When I asked, I was told the functionality was the same or essentially the same as boost::hash_combine.

It's not. Well the functionality is the same, but the interface of boost::hash_combine is incompatible with the proposed std::hash_combine. boost::hash_combine is what's _hash_combine in P0814R0. std::hash_combine(t1, ..., tN) in P0814R0 is a pure function that returns the combined hash value of its arguments. boost::hash_combine(v, t) mixes the hash value of t into the state v. boost::hash_combine is incremental and is naturally extensible into using a different state than size_t for the first argument. The proposed std::hash_combine is neither.

Peter Dimov

3:48 p.m.

Daniel James wrote:

...

Hi,

I'm planning to extract hash from functional into its own module, as that will apparently break the circular dependencies that functional is part of. Any objections? Also, any opinions on what the module should be called? 'hash' might suggest a more general purpose hash function, so maybe it should be called something more specific.

If we name the module 'hash' it's not hard to see that in the future it will be seen as the obvious home for implementations of MD5, SHA-*, and other hash algorithms (FNV-1a, cityhash...) which we do miss in Boost. So there will be pressure to add them, pull requests implementing them, and so on. The question is do we see this as a bad thing? The alternative would be to go with a name like 'hash_core', but that's not necessarily an improvement.

Daniel James

3:59 p.m.

On 19 December 2017 at 15:48, Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

If we name the module 'hash' it's not hard to see that in the future it will be seen as the obvious home for implementations of MD5, SHA-*, and other hash algorithms (FNV-1a, cityhash...) which we do miss in Boost. So there will be pressure to add them, pull requests implementing them, and so on.

The question is do we see this as a bad thing?

Well I don't want this to create extra pressure on me, so I do. I'd rather they were separate, unless we were expanding the boost::hash implementation to support multiple hash algorithms.

Peter Dimov

20 Dec 20 Dec

3:08 p.m.

Daniel James wrote:

...

On 19 December 2017 at 15:48, Peter Dimov via Boost <boost@lists.boost.org> wrote:

...
If we name the module 'hash' it's not hard to see that in the future it will be seen as the obvious home for implementations of MD5, SHA-*, and other hash algorithms (FNV-1a, cityhash...) which we do miss in Boost. So there will be pressure to add them, pull requests implementing them, and so on.

The question is do we see this as a bad thing?

Well I don't want this to create extra pressure on me, so I do. I'd rather they were separate, unless we were expanding the boost::hash implementation to support multiple hash algorithms.

After looking into this a bit, I think that it would be hard to expand our current boost::hash in a compatible manner, due to the way it's specified - boost::hash<X>()(x) is required to call hash_value(x), which is probably implemented by the user in terms of hash_combine, which in turn is required to call back to boost::hash<T>, and boost::hash can be specialized, so everything is detectable and has to work in precisely this manner. It would probably be best to proceed as planned with the extraction of functional/hash as-is into its own 'hash' module and leave the future extensible framework separate (we could imaginatively name its module 'hash2'.)

Peter Dimov

6 Jan 6 Jan

4:26 p.m.

...

... and leave the future extensible framework separate (we could imaginatively name its module 'hash2'.)

https://github.com/pdimov/hash2

Bjorn Reese

5:42 p.m.

On 01/06/18 17:26, Peter Dimov via Boost wrote:

...

https://github.com/pdimov/hash2

Splendid. Would it be possible to add configurable hash functions (or make some of the existing ones configurable?) Some algorithms, such as Bloom filters or Count-Min sketches, require a a family of related hash functions.

Vinnie Falco

5:45 p.m.

On Sat, Jan 6, 2018 at 9:42 AM, Bjorn Reese via Boost <boost@lists.boost.org> wrote:

...

Would it be possible to add configurable hash functions (or make some of the existing ones configurable?)

Some algorithms, such as Bloom filters or Count-Min sketches, require a a family of related hash functions.

What do you mean configurable? Is that a term of art?

Bjorn Reese

5:54 p.m.

On 01/06/18 18:45, Vinnie Falco via Boost wrote:

...

What do you mean configurable? Is that a term of art?

I mean being able to sets its parameters, like offset and modulo. While this is not the usual way of handling the situation, it is pragmatic alternative to than having to deal with tabulation and universal hashing. That said, I would be happy with any solution of obtaining a family of related hash functions.

Vinnie Falco

5:59 p.m.

On Sat, Jan 6, 2018 at 9:54 AM, Bjorn Reese via Boost <boost@lists.boost.org> wrote:

...

I mean being able to sets its parameters, like offset and modulo. While this is not the usual way of handling the situation, it is pragmatic alternative to than having to deal with tabulation and universal hashing. That said, I would be happy with any solution of obtaining a family of related hash functions.

I am not familiar with the terms offset and modulo as applied to popular non-cryptographic hash algorithms such as xxHash or SpookyHash. Perhaps you are thinking of a linear congruential generator? Usually the constants chosen for the hash functions are picked for their numeric properties and cannot be changed. Some of these algorithms allow the user to provide an integral "seed" which is used to permute the result. This can be used to protect a container from algorithmic complexity attacks when used with possibly adversarial inputs. In the absence of an algorithm which allows for a seed, a less efficient but equally effective method is to prepend a unique value, specific to the instance of the hash function, to the input data. This can be turned into a generic wrapper (I believe Peter will eventually add such a thing if he has not done so already). The standard approach to supplying such parameters to the hash function is to provide the values upon construction in the argument list. Thanks

Peter Dimov

6:06 p.m.

Vinnie Falco wrote:

...

In the absence of an algorithm which allows for a seed, a less efficient but equally effective method is to prepend a unique value, specific to the instance of the hash function, to theinput data. This can be turned into a generic wrapper (I believe Peter will eventually add such a thing if he has not done so already).

I decided to require algorithms to support seeding as part of the concept requirements. If the best an algorithm can do with that seed is to prepend it to the value (as-if by default constructing and calling update(seed, n)), it can do so, but usually, the algorithm can do better than that.

Bjorn Reese

6:23 p.m.

On 01/06/18 18:59, Vinnie Falco via Boost wrote:

...

I am not familiar with the terms offset and modulo as applied to popular non-cryptographic hash algorithms such as xxHash or

For FNV: s/offset/basis/ s/modulo/prime/ In other words, the constants used withing the hash function.

...

Some of these algorithms allow the user to provide an integral "seed" which is used to permute the result. This can be used to protect a container from algorithmic complexity attacks when used with possibly adversarial inputs. In the absence of an algorithm which allows for a seed, a less efficient but equally effective method is to prepend a unique value, specific to the instance of the hash function, to the input data. This can be turned into a generic wrapper (I believe Peter will eventually add such a thing if he has not done so already).

Is different seeding sufficient to ensure that they are pairwise independent?

Peter Dimov

6 p.m.

Bjorn Reese wrote:

...

On 01/06/18 17:26, Peter Dimov via Boost wrote:

...
https://github.com/pdimov/hash2

Splendid.

Would it be possible to add configurable hash functions (or make some of the existing ones configurable?)

If by configurable you mean hash functions that can take a seed, yes, the provided hash functions do have constructors taking seeds, matching their "canonical" implementations. spooky2_128's constructor, for example, is explicit spooky2_128( boost::uint64_t seed1 = 0, boost::uint64_t seed2 = 0 ); and murmur3_32 has explicit murmur3_32( boost::uint32_t seed = 0 ); In addition, all algorithms support H( byte_type const * seed, ptrdiff_t n ); as a generic way to supply a seed.

James E. King, III

19 Dec 19 Dec

6:27 p.m.

On Tue, Dec 19, 2017 at 10:48 AM, Peter Dimov via Boost < boost@lists.boost.org> wrote:

...

Daniel James wrote:

...
Hi,

I'm planning to extract hash from functional into its own module, as that will apparently break the circular dependencies that functional is part of. Any objections? Also, any opinions on what the module should be called? 'hash' might suggest a more general purpose hash function, so maybe it should be called something more specific.

If we name the module 'hash' it's not hard to see that in the future it will be seen as the obvious home for implementations of MD5, SHA-*, and other hash algorithms (FNV-1a, cityhash...) which we do miss in Boost. So there will be pressure to add them, pull requests implementing them, and so on.

The question is do we see this as a bad thing?

I asked the same question in this forum a couple months ago and there seemed to be overwhelming response of "heck no we don't want to maintain hash algorithms" for various reasons. Boost.Uuid already has an MD5 and a SHA1 hash algorithm in it, and other libraries in boost ALSO have a SHA1 implementation in them (Boost.Beast, Boost.Compute). Seems silly to maintain 3 copies of the same algorithm. So I would be all for having a Boost.Hash library that contained all of those plus a boost::container_hash which would be the equivalent of boost::hash. Given boost::hash exists already, however, the namespace for Boost.Hash would need to be something else? Alternatively, if folks still don't want to provide hashing functions, here are some suggestions: container_hash chash pound (Boost.Pound) == Boost.# = hashtag = hash? I think Boost.Hash would be confusing from a consumer perspective. It implies more than just one hash function. - Jim

Daniel James

20 Dec 20 Dec

2:52 p.m.

On 19 December 2017 at 18:27, James E. King, III via Boost <boost@lists.boost.org> wrote:

...

container_hash chash pound (Boost.Pound) == Boost.# = hashtag = hash?

I like 'container_hash' as I think it's fairly clear, or perhaps 'unordered_hash'. I'm not keen on an abbreviation like 'chash', this won't be typed a lot so clarity is more important than concision. As this is a small support library, something like conan will often install it for non-obvious reasons, so the name should be clear for users who've never heard of it, so I don't like 'pound'. And from other mails, I'm not keen on 'hash_core' as it suggests this is the core for other libraries. So for now, I'm thinking 'container_hash'. I'll wait a few more days to allow for any other feedback.

Peter Dimov

3:14 p.m.

Daniel James wrote:

...

So for now, I'm thinking 'container_hash'. I'll wait a few more days to allow for any other feedback.

Given that the module defines the name boost::hash, I think that it should be named 'hash'.

Daniel James

5:37 p.m.

On 20 December 2017 at 15:14, Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Daniel James wrote:

...
So for now, I'm thinking 'container_hash'. I'll wait a few more days to allow for any other feedback.

Given that the module defines the name boost::hash, I think that it should be named 'hash'.

'boost::shared_ptr' is in 'smart_ptr' and it doesn't seem that confusing? Btw. my initial attempt at extracting the library is at: https://github.com/danieljames/new-hash/

Peter Dimov

5:43 p.m.

Daniel James wrote:

...

...
Given that the module defines the name boost::hash, I think that it should be named 'hash'.

'boost::shared_ptr' is in 'smart_ptr' and it doesn't seem that confusing?

Yes but there isn't another library named "shared_ptr". If there were, then it would be confusing. What I meant was that the existence of boost::hash makes it awkward to use 'hash' as a module name for anything else, so we might as well use it for the obvious thing.

Daniel James

6 p.m.

On 20 December 2017 at 17:43, Peter Dimov via Boost <boost@lists.boost.org> wrote:

...

Daniel James wrote:

...
...
Given that the module defines the name boost::hash, I think that it > should be named 'hash'.

'boost::shared_ptr' is in 'smart_ptr' and it doesn't seem that confusing?

Yes but there isn't another library named "shared_ptr". If there were, then it would be confusing.

What I meant was that the existence of boost::hash makes it awkward to use 'hash' as a module name for anything else, so we might as well use it for the obvious thing.

If there is to be a better hash module, having an existing module called 'hash' could make it less findable.

Gavin Lambert

9:42 p.m.

On 21/12/2017 07:00, Daniel James wrote:

...

On 20 December 2017 at 17:43, Peter Dimov wrote:

...
What I meant was that the existence of boost::hash makes it awkward to use 'hash' as a module name for anything else, so we might as well use it for the obvious thing.

If there is to be a better hash module, having an existing module called 'hash' could make it less findable.

Perhaps that potential future library could be called cryptographic_hash instead. (Since it can't define namespace hash anyway.)

Ion Gaztañaga

19 Dec 19 Dec

6:22 p.m.

On 19/12/2017 14:31, Daniel James via Boost wrote:

...

Hi,

I'm planning to extract hash from functional into its own module, as that will apparently break the circular dependencies that functional is part of. Any objections? Also, any opinions on what the module should be called? 'hash' might suggest a more general purpose hash function, so maybe it should be called something more specific.

I think it's a good idea. A boost::hash is basic building block that every class can use (to be inserted in a hash map), so I think it's better to isolate it with minimum dependencies in its own module. Hash Core module sounds good but does this mean that we will add new headers on boost/hash_core/hash.hpp/hash_fwd.hpp/extensions.hpp and old functional headers will be redirected to those? I think this is a good idea to minimize dependencies on functional as some of my libraries only use boost::hash. Best, Ion

2750

Age (days ago)

2768

Last active (days ago)

List overview

Download

29 comments

10 participants

participants (10)

Bjorn Reese
Daniel James
Gavin Lambert
Howard Hinnant
Ion Gaztañaga
James E. King, III
Peter Dimov
Richard Hodges
Steven Watanabe
Vinnie Falco