On 18 Apr 2014, at 06:54, Steven Watanabe
AMDG
On 04/17/2014 04:19 PM, Thijs van den Berg wrote:
I've completed a nice new random engine that I would like to propose for addition to boost/random.
The engine is called "threefry" and it's based on the paper "Parallel random numbers: as easy as 1, 2, 3" by Salmon, John K. and Moraes, Mark A. and Dror, Ron O. and Shaw, David E.
I've been using it myself a lot for the last couple of years, and now I've rewritten it so that it will fit in the boost random framework and adhere to the concepts. It has some really nice properties, in particular for large scale parallel computing: * High quality randomness (according to the BigCrush tests) with a cycle length of 2^258 * Fast (comparable to Mersenne Twister) & little memory usage (13x 64 bit integers) * O(1) discard and guarantees for non-overlapping sequences when the (up to 256 bit) seeds are unequal
I've uploaded the code, description and tests to https://github.com/sitmo/threefry The actual engine is at https://github.com/sitmo/threefry/blob/master/random/include/boost/random/th...
I would really appreciate any feedback about the idea for inclusion and of course comments about the quality of the code,tests,docs
I /really/ dislike the constraint that rounds shall be 13 or 20. It looks like encrypt_counter follows a regular pattern. I'd strongly prefer it to be implemented as a loop and allow rounds to be any non-negative integer. Not all possible values make sense, but that's normal. You have to know what you're doing when you define a specialization of any random number engine. Very good. I’ll have to change things a a bit. The algorithm has indeed reppetitive patterns of multiples of length 4 and 8. I’ll make it an unconstrained integer and a loop.
Do not use reinterpret_cast in operator(). The behavior is undefined, and will certainly differ depending on endianness.
I did this for speed and I didn’t worry about endianness because it never crossed my mind. Changing the order of bits/bytes should not affect the quality of the randomness, but it’s indeed not nice to have different behaviour on different machines. I’ll fix this.
I'm not sure that I like the fact that the textual representation interleaves the three arrays. Also output is a function of key and counter and does not need to be stored. (The same observation applies to the comparison operators)
Yes, I’ve already changed that.
Is it really a good idea to expose set_key/set_counter?
Indeed, it’s not needed, I’ve moved them to private.
It would be better to have an explicit width parameter, like mersenne_twister instead of relying on the size of UIntType. Ok, this takes a bit of time but it’s not dramatic. It’s something that combines nicely with the operator() point.
I'd also prefer to see uint_least64_t in place of uint64_t. OK.
(Actually I'd like it even more if the arrays stored UIntType, but this doesn't look particularly feasible, given the way the algorithm works.) This algorithm is what they call the 4x64 variant. The paper also talks about a 2x64 and 4x32 variants (but e.g. not 8x32, 2x32). However I’m not sure if we need to implement all the variant the’ve studied (maybe yes, maybe no). They recommend 4x64 with 20 rounds as the best choice for CPU’s and that’s the main one I want to provide. AFAIK the state and the return_type need not be the same. Any 32 bit engine can be turned into a 64 engine by patching two returns together.
The 32 bit state type versions have different constants in the repetitive operators. The authors also studies variants with different cryptographic functions. E.g. one uses EAS (but that has a different name than threefry) which can be faster on CPU's supports EAS instructions. It however calls for assembler. There is also a 3rd variant that targets CUDA machines. In a wider setting all these variant use a counter and a cryptographic function (which probably all use a key?). I haven’t studies the other variants. Perhaps we can later refactor -without changing the interface- the code when we decide to add more variants? E.g. can we stick to the 4x64 threefry algorithm function, have two versions that return either 32 or 64 bit integers specified by a width argument. The two other variant 2x64 and 4x32 could perhaps be added later, those are however not recommended by the paper. I would call the 4x64 part of the algorithm, not NxM with template arguments that can be tweaked by smart users.
Don't put usings in namespace boost. The other engines have using declarations for backwards compatibility, which is not a concern for a new engine.
Ok.
Run inspect (http://www.boost.org/tools/inspect). There's at least one use of max that needs to be protected.
Excellent, I didn’t know it’s existence, sound very handy. Thanks for the useful feedback, it’s much appreciated.
In Christ, Steven Watanabe
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost