On Tue, May 6, 2014 at 3:51 PM, Thijs van den Berg
I think you'll like the new example code better. From the new README:
typedef threefry<4, uint32_t> Prf; counter_based_engine
cbeng(seed); for(size_t i=0; i You can seed the counter_based_engine directly with an arithmetic seed,
or,
if you prefer, you can use the the much larger key_type, or you can pre-initialize a Prf and use that.
I tried to examine what happend when you draw more samples inside the loop, like this
for (size_t j=0; j<100; ++j) { atoms[i].vx += mbd(cbeng); atoms[i].vy += mbd(cbeng); atoms[i].vz += mbd(cbeng); }
and that caused an abort trap. I expect that the cbeng ran out of samples and that is not the behaviour you want.
Yes. It ran out because in the counter_based_example.cpp file it's
declared with a 5-bit counter:
counter_based_engine
amount of samples. Mbd() can be a probability distribution that consumes many random samples from the engine. E.g. I’ve just finished a distribution sampler that consumes 4 per draw.
Yes. Some distributions require even more. IIRC, the Poisson distribution can sometimes require O(10). If you're using the generator the way I advocate (restarted every time through a loop), then 32 counter bits will likely be more than enough. If you're using it in a more conventional way, (one generator per thread), then a 64-bit counter is probably a better idea, as it will take centuries to count down. It might make sense to have default value for CtrBits. I'd go with the minimum of 64 and half the width of the Prf's range. I.e., 32 for threefry2x32 and philox2x32 and 64 for everything else. The same issues arises if you have many more variables to set than
vx,vy,vz, which make the usage error-prone. One solution for this particular example is that you move the domain_type variable inside the 2nd loop and make it a function of both i,j from the loops. However that still isn’t a nice design because it can be very inefficient. Every time time you call restart() it will generate 4x64 bits of random data on it’s first call, ..but you only consume 3x32 bits. Putting the burden of optimising this on the user is not a good idea.
Actually, it's only 4x32 bits on each call. But your point stands. To get the best possible performance one needs to be aware of some details. How many random values, and of what size you get with each "turn of the crank" are imporant details. If you ignore them, you'll get good, reliable, correctly distributed values, but you might waste some time. The version I’ve finished here https://github.com/sitmo/threefry is a
standard random engine concept and doesn’t need a domain_type, key_type, restart() and doesn’t have a low limit on the number of samples that users need to be aware of. Its intuitive to use like any other random engine -as can be seen in the next example-. I *think* it also uses less memory and is also faster because I’ve optimised the engine for it’s purpose.
I'm happy to look at optimization. But it should probably wait till we've agreed on an API. My version is also aggressively optimized, e.g., I used mpl::foreach to unroll some loops because gcc wouldn't unroll them without help. If you've found ways to make it even smaller or faster, I'm eager to learn. John Salmon