Any interest in Compile Time Hash Containers library?
Hi everybody, my first message, I hope I do not mess this up.
I was contemplating writing a Compile Time Hash Containers library and I
wrote a test implementation for sets and benchmarked it against mostly
ska:: sets since they are the best.
Document is here:
https://docs.google.com/document/d/1IsUiWl3K0h2pgttYgMt67CxgJ3zeXBKZEA7hvAQ5...
It is quite long, but it aims to provide a introduction to the problem for
people unfamiliar with details of hashing, and also it goes into a lot of
discussions, if you care just about the numbers you can skip the "boring
parts".
My interpretation of benchmarks is that it performs very well in some
cases, decently in others, but I am biased. :)
Code is not available online since I do not want to dump a test project
online without extensive documentation, but if somebody has a problem that
may be benefited by this test implementation feel free to let me know, I
would be happy to get some real world feedback on prototype.
Limitations is that test implementation only works for sets, so no maps and
if your objects are not integers you will probably not get much speedup.
Also if your set has more than 100ish items GCC will run out of memory and
VC++ does not compile code at all, so clang is the only way for most use
cases today.
One more disclaimer is that I do not want this library to be some beast
like ranges that takes years to implement, so if after reading the document
you consider implementation quite basic I won't be offended :) In other
words I feel that library like this although relatively simple compared to
regular Boost libraries can provide quite a lot of benefit for some
usecases where people really really care about performance of hash
containers where values are known during compile time.
I know you can not predict the future of an unimplemented library, but I
would be happy with some general feedback, and if you think this library
would be nice boost proposal I would be interested in some pointers wrt how
to proceed.
If somebody wonders how user code looks at the moment:
#include "cxhash_set.h"
// also could be a result of constexpr function, no need for initializer
list
static constexpr std::array
While I don't personally have a use for it, I can see that someone will, and besides, it's kind of cool that you can do that ;) Question: would the interface be better specified with a constexpr initializer-list constructor rather than templating on an array of data? That way you would have complete control of memory layout as well? Or am I misinterpreting things? Best, John. On 21/01/2019 13:39, Ivan Matek via Boost wrote:
Hi everybody, my first message, I hope I do not mess this up.
I was contemplating writing a Compile Time Hash Containers library and I wrote a test implementation for sets and benchmarked it against mostly ska:: sets since they are the best.
Document is here:
https://docs.google.com/document/d/1IsUiWl3K0h2pgttYgMt67CxgJ3zeXBKZEA7hvAQ5...
It is quite long, but it aims to provide a introduction to the problem for people unfamiliar with details of hashing, and also it goes into a lot of discussions, if you care just about the numbers you can skip the "boring parts".
My interpretation of benchmarks is that it performs very well in some cases, decently in others, but I am biased. :)
Code is not available online since I do not want to dump a test project online without extensive documentation, but if somebody has a problem that may be benefited by this test implementation feel free to let me know, I would be happy to get some real world feedback on prototype. Limitations is that test implementation only works for sets, so no maps and if your objects are not integers you will probably not get much speedup. Also if your set has more than 100ish items GCC will run out of memory and VC++ does not compile code at all, so clang is the only way for most use cases today.
One more disclaimer is that I do not want this library to be some beast like ranges that takes years to implement, so if after reading the document you consider implementation quite basic I won't be offended :) In other words I feel that library like this although relatively simple compared to regular Boost libraries can provide quite a lot of benefit for some usecases where people really really care about performance of hash containers where values are known during compile time.
I know you can not predict the future of an unimplemented library, but I would be happy with some general feedback, and if you think this library would be nice boost proposal I would be interested in some pointers wrt how to proceed.
If somebody wonders how user code looks at the moment:
#include "cxhash_set.h" // also could be a result of constexpr function, no need for initializer list static constexpr std::array
ints{0,1,2,3,5,8,21,34,64,128,256,512,1024,1701,1729,2048,4096,12345,123456,1234567}; int main() { cxhash_set<ints> ints_set; // contains could(should be?) static, but for now it is "normal" member fn std::cout << std::boolalpha << ints_set.contains(21) << std::endl; std::cout << std::boolalpha << ints_set.contains(22) << std::endl; } P.S. feel free to comment in the google doc, I actually find mailing lists quite limited when it comes to prolonged and or detailed discussions. Maybe we can ping Alexandrescu for an instance of dlang forums, so we get a better discussion tools, and he gets a way to convert us to D. :)
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
--- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
On Thu, Jan 24, 2019 at 9:49 AM John Maddock via Boost < boost@lists.boost.org> wrote:
Question: would the interface be better specified with a constexpr initializer-list constructor rather than templating on an array of data? That way you would have complete control of memory layout as well? Or am I misinterpreting things?
I think that if you want best performance you want to template on values since if for example you only template on type, then you can not know some very useful stuff at compile time. We want to pick a constant for universal hashing that works great for our values. Consider example where we are lucky and we get no collisions. If we know that at compile time compiler can remove the loop(probe until some condition) since it knows iteration count is 1. If that value is just some member variable that can be 1 or 2 or 5 depending on data compiler can not remove that loop. In other words I believe that for best performance codegen for different values needs to be different. I am happy to be proven wrong. :)
On Fri, 25 Jan 2019 at 01:00, Ivan Matek via Boost
I think that if you want best performance you want to template on values since if for example you only template on type, then you can not know some very useful stuff at compile time. We want to pick a constant for universal hashing that works great for our values. Consider example where we are lucky and we get no collisions.
How is this [whatever it is] different from frozen ( https://github.com/serge-sans-paille/frozen )? From what I read, it [frozen] always achieves perfect hashing. The good thing about frozen is that it's not pie-in-the-sky and it actually works. If we know that at compile time compiler
can remove the loop(probe until some condition) since it knows iteration count is 1. If that value is just some member variable that can be 1 or 2 or 5 depending on data compiler can not remove that loop. In other words I believe that for best performance codegen for different values needs to be different. I am happy to be proven wrong. :)
Did you, or did you not write the code [you say it's not written and then post timing results, it's confusing]? How can we prove you wrong (or right), if all we see is lots of words and no code? Stop argumenting and start showing some code. degski -- *“If something cannot go on forever, it will stop" - Herbert Stein*
How is this [whatever it is] different from frozen ( https://github.com/serge-sans-paille/frozen )?
If you look into the document and search for frozen you will find out.
Did you, or did you not write the code [you say it's not written and then post timing results, it's confusing]? How can we prove you wrong (or right), if all we see is lots of words and no code? Stop argumenting and start showing some code.
Like I said in he original message and the document I implemented set that
works good for integers(you can provide custom hasher for other types, but perf is bad). There is no map implemented, and for complex types like strings performance sucks(again this is discussed in the doc). If you want the code feel free to ping me directly on my mail, only things I require is that: you do not claim ownership of code you do not blame me for bugs in case you do use code in real production and it causes damage, in other words no liability on my part as permissible by law you do not expect me to support you in x years(if I can help you today I will) or implement features on top of this code(I think there is a good chance this code will be changed significantly including: names and hashers avaliable) you do compile with clang(GCC/VC++ are partly/fully broken for this code)
On Fri, 25 Jan 2019 at 08:37, Ivan Matek
How is this [whatever it is] different from frozen (
From your doc:
Frozen Another example of library doing very similar thing to what we want to do is Frozen https://blog.quarkslab.com/frozen-an-header-only-constexpr-alternative-to-gp.... For details you should consult the documentation, but the general idea is similar: try to get good hashing based on the fact the data is known at compilation time. Difference to us is that frozen does perfect hashing. "For details you should consult the documentation", right, we're getting somewhere, oh, maybe not ... "Difference to us is that frozen does perfect hashing": ... and your idea is ... to do imperfect hashing? The document is far too long and there are far too many diversions/excursions into la-la-land. Take as example the readme.md from frozen and put something similar together, please.
If you want the code feel free to ping me directly on my mail ...
Just stick it on github, that's what the rest of the world does ... if you want to have it included in Boost, that's what you'll need to do in the end anyway ...
, only things I require is that: you do not claim ownership of code you do not blame me for bugs in case you do use code in real production and it causes damage, in other words no liability on my part as permissible by law you do not expect me to support you in x years(if I can help you today I will) or implement features on top of this code(I think there is a good chance this code will be changed significantly including: names and hashers avaliable) you do compile with clang(GCC/VC++ are partly/fully broken for this code)
... and put an appropriate license (BSL ( https://www.boost.org/users/license.html ) comes to mind, as that's what you are proposing anyway) to deal with the above... degski -- *“If something cannot go on forever, it will stop" - Herbert Stein*
On Thu, 24 Jan 2019 at 01:34, Ivan Matek via Boost
Hi everybody, my first message, I hope I do not mess this up.
I was contemplating writing a Compile Time Hash Containers library and I wrote a test implementation for sets and benchmarked it against mostly ska:: sets since they are the best.
In my experience that would be very useful. This kind of technique is already used by some people to create hashes as keys to acquired localized texts. (This combined with something like the proposal for std::embedd() would be particularly powerful.) A. Joël Lamotte
On Wed, Jan 23, 2019 at 4:34 PM Ivan Matek via Boost
I was contemplating writing a Compile Time Hash...library
The very first thing that popped into my head when I saw this is "I would like a library that emits a perfect hash function at compile time from a set of inputs." Motivating use case: https://github.com/boostorg/beast/blob/develop/include/boost/beast/http/impl... Regards
participants (5)
-
degski
-
Ivan Matek
-
John Maddock
-
Klaim - Joël Lamotte
-
Vinnie Falco