[hash] Extract module from functional + std::hash_combine
Hi, I'm planning to extract hash from functional into its own module, as that will apparently break the circular dependencies that functional is part of. Any objections? Also, any opinions on what the module should be called? 'hash' might suggest a more general purpose hash function, so maybe it should be called something more specific. Daniel
IMHO
boost::hash is perfect, in every respect. Follow that (including all the
missing functionality in std::hash - particularly ADL-friendly hash_value())
and you won't go wrong.
#include <hash> is fine by me. If we ever want any other kind of hash,
surely is could go into #include <md5> or #include <sha256> etc.
until std::hash >= boost::hash, it's not useful.
In addition, I would propose (I am sure I'm not the first!):
namespace std
{
template<class T = void> struct hash { ... };
template<>
struct hash<void>
{
template<class T>
auto operator()(T&& arg) const {
return hash_value(arg); // ADL to the rescue
}
};
}
and then:
template
Hi,
I'm planning to extract hash from functional into its own module, as that will apparently break the circular dependencies that functional is part of. Any objections? Also, any opinions on what the module should be called? 'hash' might suggest a more general purpose hash function, so maybe it should be called something more specific.
Daniel
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost
On 19 December 2017 at 14:12, Richard Hodges via Boost
Whoever decided to downgrade boost hash before solidifying its uselessness in the c++11 standard?
The C++11 std::hash was specified before boost::hash was created, so it wasn't downgraded. There have been several proposals for improving std::hash, some of which do have advantages over `boost::hash`, but the standards committee hasn't settled on any of them. But it's possible that the next standard will include 'hash_combine', as well as functions using variadic arguments: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3876.pdf This is a bit of a compromise that makes it easier to implement custom hash functions but doesn't prevent any of the other proposal from going forward. I was thinking about implementing the variadic functions in that paper for 1.67.0, but I think I'll wait to see if it's added to the standard.
The C++11 std::hash was specified before boost::hash was created, so...
source :
https://github.com/boostorg/functional/blob/develop/include/boost/functional...
// Copyright 2005-2014 Daniel James.
// Distributed under the Boost Software License, Version 1.0. (See
accompanying
// file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
You're saying that the c++11 standard codified std::hash in 2005? This
seems to me to be an extraordinary claim.
More likely is that std::hash is the retarded cousin of boost::hash, and
the c++11 committee's gravest error.
On 19 December 2017 at 15:51, Daniel James via Boost
On 19 December 2017 at 14:12, Richard Hodges via Boost
wrote: Whoever decided to downgrade boost hash before solidifying its
uselessness
in the c++11 standard?
The C++11 std::hash was specified before boost::hash was created, so it wasn't downgraded. There have been several proposals for improving std::hash, some of which do have advantages over `boost::hash`, but the standards committee hasn't settled on any of them. But it's possible that the next standard will include 'hash_combine', as well as functions using variadic arguments:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3876.pdf
This is a bit of a compromise that makes it easier to implement custom hash functions but doesn't prevent any of the other proposal from going forward. I was thinking about implementing the variadic functions in that paper for 1.67.0, but I think I'll wait to see if it's added to the standard.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost
AMDG On 12/19/2017 08:23 AM, Richard Hodges via Boost wrote:
The C++11 std::hash was specified before boost::hash was created, so...
source : https://github.com/boostorg/functional/blob/develop/include/boost/functional...
// Copyright 2005-2014 Daniel James. // Distributed under the Boost Software License, Version 1.0. (See accompanying // file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
You're saying that the c++11 standard codified std::hash in 2005? This seems to me to be an extraordinary claim.
hash was part of TR1
More likely is that std::hash is the retarded cousin of boost::hash, and the c++11 committee's gravest error.
In Christ, Steven Watanabe
hash was part of TR1
TR1 was circulated in 2005. std::hash had 6 years to up its game before 2011. Boost managed it. I have read google's proposal on improving std::hash. It's not as if the committee is unaware of the general uselessness of std::hash. Unless std::hash is raised to the level of competency enjoyed by boost hash, there is no point do any further work on it. It will remain un-useable. It is simply laughably inconvenient to have to inject hash specialisations into the std namespace. Please don't waste your valuable time. Just adopt boost::hash into std. On 19 December 2017 at 16:36, Steven Watanabe via Boost < boost@lists.boost.org> wrote:
AMDG
On 12/19/2017 08:23 AM, Richard Hodges via Boost wrote:
The C++11 std::hash was specified before boost::hash was created, so...
source : https://github.com/boostorg/functional/blob/develop/ include/boost/functional/hash/hash.hpp
// Copyright 2005-2014 Daniel James. // Distributed under the Boost Software License, Version 1.0. (See accompanying // file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
You're saying that the c++11 standard codified std::hash in 2005? This seems to me to be an extraordinary claim.
hash was part of TR1
More likely is that std::hash is the retarded cousin of boost::hash, and the c++11 committee's gravest error.
In Christ, Steven Watanabe
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost
Daniel James wrote:
But it's possible that the next standard will include 'hash_combine', as well as functions using variadic arguments:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3876.pdf
Nico Josuttis has been pushing for std::hash_combine - his latest is http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0814r0.pdf and I vaguely remember it being accepted by LEWG at the last meeting, but at the moment I'm not convinced that this is the right way forward. The alternatives proposed by Howard Hinnant and by Google are better. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0029r0.html http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3980.html The problem with our hash_combine is twofold. First, it doesn't allow intermediate state bigger than size_t. Second, it necessitates that the value of the object is collapsed into a single size_t before being mixed into the intermediate state (which is also the final state in our implementation, but it doesn't need to be.) So if we have struct X { Y y; Z z; }; struct Y { U u; V v; }; and we evaluate hash<X>()(x), we get size_t state = 0; size_t v1 = hash_value(x.y); size_t st2 = 0; size_t w1 = hash_value(x.y.u); hash_combine(st2,w1); size_t w2 = hash_value(x.y.v); hash_combine(st2,w2); return st2; hash_combine(state, v1); size_t v2 = hash_value(x.z); hash_combine(state, v2); return state; // or f(state) whereas we'd rather like the following: State state{}; // hash algorithm dependent hash_combine(state, x.y); hash_combine(state, x.y.u); hash_combine(state, x.y.v); hash_combine(state, x.z); return f(state); The interesting thing is that our hash_combine signature is actually compatible with the second approach; we just need to make the state (currently fixed to size_t) a template parameter, and instruct people to write their custom overloads as: template<class State> void hash_combine(State& state, X const& x) { hash_combine(state, x.y); hash_combine(state, x.z); } instead of providing hash_value. Hash algorithms, in turn, will (may(*)) provide overloads of the form void hash_combine(md5& state, int const& x) { // mix x into state } and the hash<> object will create an instance of the algorithm, call hash_combine, and return the final state. This is basically what N3980 does, except with hash_append. (*) May, because we can provide a generic overload for 'int' that hashes it at the byte level, as is the standard nowadays.
On 19 December 2017 at 16:14, Peter Dimov via Boost
Daniel James wrote:
But it's possible that the next standard will include 'hash_combine', as well as functions using variadic arguments:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3876.pdf
Nico Josuttis has been pushing for std::hash_combine - his latest is
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0814r0.pdf
Ah, I got the wrong one. The problem with this paper is that hash_combine is defined differently to ours, so if it's added to the standard, it's going to make it awkward for me to follow it.
and I vaguely remember it being accepted by LEWG at the last meeting, but at the moment I'm not convinced that this is the right way forward.
I heard that on Slack after the last meeting. That's why I was picking up on this paper, and not the others.
The alternatives proposed by Howard Hinnant and by Google are better.
IIUC the Josuttis paper is intended as a stop-gap that won't get in the way of a better proposal. But it's a bit half-baked as there are a lot of use cases it doesn't have good support for, such as hashing ranges or conditionally hashing a member.
Daniel James wrote:
The alternatives proposed by Howard Hinnant and by Google are better.
IIUC the Josuttis paper is intended as a stop-gap that won't get in the way of a better proposal.
But it does get in the way, because the Google proposal has hash_combine with different semantics.
On Dec 19, 2017, at 12:01 PM, Peter Dimov via Boost
Daniel James wrote:
The alternatives proposed by Howard Hinnant and by Google are better.
IIUC the Josuttis paper is intended as a stop-gap that won't get in the way of a better proposal.
But it does get in the way, because the Google proposal has hash_combine with different semantics.
The name “hash_combine” was discussed in the LEWG. When I asked, I was told the functionality was the same or essentially the same as boost::hash_combine. Therefore I recommended that the standard keep the name hash_combine for this proposal as it is already well-known, and would serve as a clearer warning about what this functionality will (and will not) do. If new functionality is to be provided which allows any algorithm to be incrementally updated with data, and then finalized to compute a hash, in such a way that the integrity of the hash algorithm is not compromised simply because it isn’t fed all the data at one time, I believe we need a new name (other than hash_combine). Otherwise there will be confusion. I chose hash_append for that new name. There may be other better names. But hash_combine is already taken and should not be reused for this purpose. Howard
Howard Hinnant wrote:
When I asked, I was told the functionality was the same or essentially the same as boost::hash_combine.
It's not. Well the functionality is the same, but the interface of boost::hash_combine is incompatible with the proposed std::hash_combine. boost::hash_combine is what's _hash_combine in P0814R0. std::hash_combine(t1, ..., tN) in P0814R0 is a pure function that returns the combined hash value of its arguments. boost::hash_combine(v, t) mixes the hash value of t into the state v. boost::hash_combine is incremental and is naturally extensible into using a different state than size_t for the first argument. The proposed std::hash_combine is neither.
Daniel James wrote:
Hi,
I'm planning to extract hash from functional into its own module, as that will apparently break the circular dependencies that functional is part of. Any objections? Also, any opinions on what the module should be called? 'hash' might suggest a more general purpose hash function, so maybe it should be called something more specific.
If we name the module 'hash' it's not hard to see that in the future it will be seen as the obvious home for implementations of MD5, SHA-*, and other hash algorithms (FNV-1a, cityhash...) which we do miss in Boost. So there will be pressure to add them, pull requests implementing them, and so on. The question is do we see this as a bad thing? The alternative would be to go with a name like 'hash_core', but that's not necessarily an improvement.
On 19 December 2017 at 15:48, Peter Dimov via Boost
If we name the module 'hash' it's not hard to see that in the future it will be seen as the obvious home for implementations of MD5, SHA-*, and other hash algorithms (FNV-1a, cityhash...) which we do miss in Boost. So there will be pressure to add them, pull requests implementing them, and so on.
The question is do we see this as a bad thing?
Well I don't want this to create extra pressure on me, so I do. I'd rather they were separate, unless we were expanding the boost::hash implementation to support multiple hash algorithms.
Daniel James wrote:
On 19 December 2017 at 15:48, Peter Dimov via Boost
wrote: If we name the module 'hash' it's not hard to see that in the future it will be seen as the obvious home for implementations of MD5, SHA-*, and other hash algorithms (FNV-1a, cityhash...) which we do miss in Boost. So there will be pressure to add them, pull requests implementing them, and so on.
The question is do we see this as a bad thing?
Well I don't want this to create extra pressure on me, so I do. I'd rather they were separate, unless we were expanding the boost::hash implementation to support multiple hash algorithms.
After looking into this a bit, I think that it would be hard to expand our current boost::hash in a compatible manner, due to the way it's specified - boost::hash<X>()(x) is required to call hash_value(x), which is probably implemented by the user in terms of hash_combine, which in turn is required to call back to boost::hash<T>, and boost::hash can be specialized, so everything is detectable and has to work in precisely this manner. It would probably be best to proceed as planned with the extraction of functional/hash as-is into its own 'hash' module and leave the future extensible framework separate (we could imaginatively name its module 'hash2'.)
... and leave the future extensible framework separate (we could imaginatively name its module 'hash2'.)
On 01/06/18 17:26, Peter Dimov via Boost wrote:
Splendid. Would it be possible to add configurable hash functions (or make some of the existing ones configurable?) Some algorithms, such as Bloom filters or Count-Min sketches, require a a family of related hash functions.
On Sat, Jan 6, 2018 at 9:42 AM, Bjorn Reese via Boost
Would it be possible to add configurable hash functions (or make some of the existing ones configurable?)
Some algorithms, such as Bloom filters or Count-Min sketches, require a a family of related hash functions.
What do you mean configurable? Is that a term of art?
On 01/06/18 18:45, Vinnie Falco via Boost wrote:
What do you mean configurable? Is that a term of art?
I mean being able to sets its parameters, like offset and modulo. While this is not the usual way of handling the situation, it is pragmatic alternative to than having to deal with tabulation and universal hashing. That said, I would be happy with any solution of obtaining a family of related hash functions.
On Sat, Jan 6, 2018 at 9:54 AM, Bjorn Reese via Boost
I mean being able to sets its parameters, like offset and modulo. While this is not the usual way of handling the situation, it is pragmatic alternative to than having to deal with tabulation and universal hashing. That said, I would be happy with any solution of obtaining a family of related hash functions.
I am not familiar with the terms offset and modulo as applied to popular non-cryptographic hash algorithms such as xxHash or SpookyHash. Perhaps you are thinking of a linear congruential generator? Usually the constants chosen for the hash functions are picked for their numeric properties and cannot be changed. Some of these algorithms allow the user to provide an integral "seed" which is used to permute the result. This can be used to protect a container from algorithmic complexity attacks when used with possibly adversarial inputs. In the absence of an algorithm which allows for a seed, a less efficient but equally effective method is to prepend a unique value, specific to the instance of the hash function, to the input data. This can be turned into a generic wrapper (I believe Peter will eventually add such a thing if he has not done so already). The standard approach to supplying such parameters to the hash function is to provide the values upon construction in the argument list. Thanks
Vinnie Falco wrote:
In the absence of an algorithm which allows for a seed, a less efficient but equally effective method is to prepend a unique value, specific to the instance of the hash function, to theinput data. This can be turned into a generic wrapper (I believe Peter will eventually add such a thing if he has not done so already).
I decided to require algorithms to support seeding as part of the concept requirements. If the best an algorithm can do with that seed is to prepend it to the value (as-if by default constructing and calling update(seed, n)), it can do so, but usually, the algorithm can do better than that.
On 01/06/18 18:59, Vinnie Falco via Boost wrote:
I am not familiar with the terms offset and modulo as applied to popular non-cryptographic hash algorithms such as xxHash or
For FNV: s/offset/basis/ s/modulo/prime/ In other words, the constants used withing the hash function.
Some of these algorithms allow the user to provide an integral "seed" which is used to permute the result. This can be used to protect a container from algorithmic complexity attacks when used with possibly adversarial inputs. In the absence of an algorithm which allows for a seed, a less efficient but equally effective method is to prepend a unique value, specific to the instance of the hash function, to the input data. This can be turned into a generic wrapper (I believe Peter will eventually add such a thing if he has not done so already).
Is different seeding sufficient to ensure that they are pairwise independent?
Bjorn Reese wrote:
On 01/06/18 17:26, Peter Dimov via Boost wrote:
Splendid.
Would it be possible to add configurable hash functions (or make some of the existing ones configurable?)
If by configurable you mean hash functions that can take a seed, yes, the provided hash functions do have constructors taking seeds, matching their "canonical" implementations. spooky2_128's constructor, for example, is explicit spooky2_128( boost::uint64_t seed1 = 0, boost::uint64_t seed2 = 0 ); and murmur3_32 has explicit murmur3_32( boost::uint32_t seed = 0 ); In addition, all algorithms support H( byte_type const * seed, ptrdiff_t n ); as a generic way to supply a seed.
On Tue, Dec 19, 2017 at 10:48 AM, Peter Dimov via Boost < boost@lists.boost.org> wrote:
Daniel James wrote:
Hi,
I'm planning to extract hash from functional into its own module, as that will apparently break the circular dependencies that functional is part of. Any objections? Also, any opinions on what the module should be called? 'hash' might suggest a more general purpose hash function, so maybe it should be called something more specific.
If we name the module 'hash' it's not hard to see that in the future it will be seen as the obvious home for implementations of MD5, SHA-*, and other hash algorithms (FNV-1a, cityhash...) which we do miss in Boost. So there will be pressure to add them, pull requests implementing them, and so on.
The question is do we see this as a bad thing?
I asked the same question in this forum a couple months ago and there seemed to be overwhelming response of "heck no we don't want to maintain hash algorithms" for various reasons. Boost.Uuid already has an MD5 and a SHA1 hash algorithm in it, and other libraries in boost ALSO have a SHA1 implementation in them (Boost.Beast, Boost.Compute). Seems silly to maintain 3 copies of the same algorithm. So I would be all for having a Boost.Hash library that contained all of those plus a boost::container_hash which would be the equivalent of boost::hash. Given boost::hash exists already, however, the namespace for Boost.Hash would need to be something else? Alternatively, if folks still don't want to provide hashing functions, here are some suggestions: container_hash chash pound (Boost.Pound) == Boost.# = hashtag = hash? I think Boost.Hash would be confusing from a consumer perspective. It implies more than just one hash function. - Jim
On 19 December 2017 at 18:27, James E. King, III via Boost
container_hash chash pound (Boost.Pound) == Boost.# = hashtag = hash?
I like 'container_hash' as I think it's fairly clear, or perhaps 'unordered_hash'. I'm not keen on an abbreviation like 'chash', this won't be typed a lot so clarity is more important than concision. As this is a small support library, something like conan will often install it for non-obvious reasons, so the name should be clear for users who've never heard of it, so I don't like 'pound'. And from other mails, I'm not keen on 'hash_core' as it suggests this is the core for other libraries. So for now, I'm thinking 'container_hash'. I'll wait a few more days to allow for any other feedback.
On 20 December 2017 at 15:14, Peter Dimov via Boost
Daniel James wrote:
So for now, I'm thinking 'container_hash'. I'll wait a few more days to allow for any other feedback.
Given that the module defines the name boost::hash, I think that it should be named 'hash'.
'boost::shared_ptr' is in 'smart_ptr' and it doesn't seem that confusing? Btw. my initial attempt at extracting the library is at: https://github.com/danieljames/new-hash/
Daniel James wrote:
Given that the module defines the name boost::hash, I think that it should be named 'hash'.
'boost::shared_ptr' is in 'smart_ptr' and it doesn't seem that confusing?
Yes but there isn't another library named "shared_ptr". If there were, then it would be confusing. What I meant was that the existence of boost::hash makes it awkward to use 'hash' as a module name for anything else, so we might as well use it for the obvious thing.
On 20 December 2017 at 17:43, Peter Dimov via Boost
Daniel James wrote:
Given that the module defines the name boost::hash, I think that it > should be named 'hash'.
'boost::shared_ptr' is in 'smart_ptr' and it doesn't seem that confusing?
Yes but there isn't another library named "shared_ptr". If there were, then it would be confusing.
What I meant was that the existence of boost::hash makes it awkward to use 'hash' as a module name for anything else, so we might as well use it for the obvious thing.
If there is to be a better hash module, having an existing module called 'hash' could make it less findable.
On 21/12/2017 07:00, Daniel James wrote:
On 20 December 2017 at 17:43, Peter Dimov wrote:
What I meant was that the existence of boost::hash makes it awkward to use 'hash' as a module name for anything else, so we might as well use it for the obvious thing.
If there is to be a better hash module, having an existing module called 'hash' could make it less findable.
Perhaps that potential future library could be called cryptographic_hash instead. (Since it can't define namespace hash anyway.)
On 19/12/2017 14:31, Daniel James via Boost wrote:
Hi,
I'm planning to extract hash from functional into its own module, as that will apparently break the circular dependencies that functional is part of. Any objections? Also, any opinions on what the module should be called? 'hash' might suggest a more general purpose hash function, so maybe it should be called something more specific.
I think it's a good idea. A boost::hash is basic building block that every class can use (to be inserted in a hash map), so I think it's better to isolate it with minimum dependencies in its own module. Hash Core module sounds good but does this mean that we will add new headers on boost/hash_core/hash.hpp/hash_fwd.hpp/extensions.hpp and old functional headers will be redirected to those? I think this is a good idea to minimize dependencies on functional as some of my libraries only use boost::hash. Best, Ion
participants (10)
-
Bjorn Reese
-
Daniel James
-
Gavin Lambert
-
Howard Hinnant
-
Ion Gaztañaga
-
James E. King, III
-
Peter Dimov
-
Richard Hodges
-
Steven Watanabe
-
Vinnie Falco