Add an adapter to cover fast bit functions
Hi all, some days ago I opened a thread about DynamicBitset[1] (where I noted that a small piece of code may double the speed of a function), which received some interesting answers (you guys rock!) I looked deeper and realized that Boost doesn't have functionality to do common bit operations *really* fast. Let me explain - there are several places in Boost where we needed to solve typical tasks connected with bits in integers - like "get log2 of an integer rounded down" or "get the number of trailing zeros" or "count number of 1-bits", and Boost contributors have been inventing there a wheel everywhere, with varying degrees of optimality: [2] - [6]. Moreover, if a programmer writes a cross-platform app and has to do smth with this stuff, it's likely that for bit operations he will write a "naive" implementations - e.g. getting the log2(x) for O(log N), when it could be done for O(log log N), but with a more complicated code. Also, as well as "smart" algos, there are other methods to use - precalced tables, taking a small additional memory when needed [6], and intrinsics (that are CPU-specific). Combining of these methods (intrinsics when conveniently, or tables/algos) gives a notable shift in performance - here's my benchmark [7] from my tiny library [8]. Performance here varies from *9.2%* to *36.9%* of speed of related naive functions. I can tell you a usercase - as a hobby I'm writing a chess variants engine (where bitsets used to represent chess board, and its size is *not* fixed to 64), and bit operations' speed there is a matter of life and death. So, what do you think of an adapter header in Boost, covering all these operation of all integer types (and maybe some other one-liners like `is_bit_set(x, pos)` ), which keeps programmers out of platform specifics? It may look as smth like this [8] (I wrote only 3 functions, but it's enough to show the potential). I guess it can be placed somewhere to *Boost.Algorithm*, and supported with CI build tests of different platforms. P.S. The (incomplete) list of bit functions, that will give a *notable* better performance compared with naive algos and even some "smart" (because of precalc/intrinsics): - count of bits - parity of bit count - first 1-bit - last 1-bit (aka integer_log2(x)) - leading zeros - trailing zeros - swap(x) - returns x with the order of the bytes reversed; for example, 0xaabb becomes 0xbbaa. (bytes here are 8 bits exactly) - rotate_left(x, shift) - rotate_right(x, shift) Other functions may look like `is_set(x, pos) -> return x & (1 << pos)` and aren't so interesting. [1]: https://lists.boost.org/Archives/boost/2018/08/242825.php [2]: https://github.com/boostorg/integer/blob/develop/include/boost/integer/integ... [3]: https://github.com/boostorg/move/blob/develop/include/boost/move/algo/detail... [4]: https://github.com/boostorg/dynamic_bitset/blob/develop/include/boost/pendin... [5]: https://github.com/boostorg/integer/blob/develop/include/boost/integer/commo... (really almost the whole file) [6]: https://github.com/boostorg/dynamic_bitset/blob/develop/include/boost/detail... [7]: https://gist.githubusercontent.com/Izaron/8c94b76818393ffd93613d255033bb48/r... [8]: https://github.com/Izaron/bit_algo/blob/master/include/bits.hpp
AMDG On 08/26/2018 05:32 PM, Evgeny Shulgin via Boost wrote:
Hi all, some days ago I opened a thread about DynamicBitset[1] (where I noted that a small piece of code may double the speed of a function), which received some interesting answers (you guys rock!)
I looked deeper and realized that Boost doesn't have functionality to do common bit operations *really* fast.<snip>
So, what do you think of an adapter header in Boost, covering all these operation of all integer types (and maybe some other one-liners like `is_bit_set(x, pos)` ), which keeps programmers out of platform specifics? It may look as smth like this [8] (I wrote only 3 functions, but it's enough to show the potential).
- Please don't #include <iostream> in headers, if you can avoid it. - You're using C++14 constexpr, so BOOST_CXX14_CONSTEXPR would be the correct macro. - You might need to think about whether/how to support signed integers. In particular, I'm thinking about swap and rotate. - For unit tests, it's better to use a prng instead of random_device. non-deterministic tests make it more difficult to reproduce test failures. - It's better to separate benchmarks from unit tests. In particular, storing the elements in a vector is necessary for benchmarking, since you don't want to benchmark the generation of the values, but for tests, it artificially limits the maximum number of values you can check. - For char and short, testing every possible input is easy. You can do int, too, if you're willing to wait a bit for it to finish. - s/runned/ran/
I guess it can be placed somewhere to *Boost.Algorithm*, and supported with CI build tests of different platforms.
Or Boost.Integer.
P.S. The (incomplete) list of bit functions, that will give a *notable* better performance compared with naive algos and even some "smart" (because of precalc/intrinsics): - count of bits - parity of bit count - first 1-bit - last 1-bit (aka integer_log2(x)) - leading zeros - trailing zeros - swap(x) - returns x with the order of the bytes reversed; for example, 0xaabb becomes 0xbbaa. (bytes here are 8 bits exactly) - rotate_left(x, shift) - rotate_right(x, shift) Other functions may look like `is_set(x, pos) -> return x & (1 << pos)` and aren't so interesting.
In Christ, Steven Watanabe
- Please don't #include <iostream> in headers, if you can avoid it. My bad, I forgot to remove this huge unused header, I should've used a tool
Thank you for your reply! It really helped me a lot. like include-what-you-use.
- You're using C++14 constexpr, so BOOST_CXX14_CONSTEXPR would be the correct macro. Thanks, I missed the fact that in C++11 it is correct only when the function has exactly one return
- You might need to think about whether/how to support signed integers. In particular, I'm thinking about swap and rotate. I guess "rotate" should not affect the sign bit. If I remember correctly, there are commands in Assembler to rotate signed numbers. "Swap" works with whole bytes and will implicitly take a number as an unsigned. But it's a theme to discuss, as for other functions.
It's good to check signed types in the benchmark as well.
- For unit tests, it's better to use a prng instead of random_device. non-deterministic tests make it more difficult to reproduce test failures. I'll change it, sounds good.
- It's better to separate benchmarks from unit tests. In particular, storing the elements in a vector is necessary for benchmarking, since you don't want to benchmark the generation of the values, but for tests, it artificially limits the maximum number of values you can check. Values generation isn't benchmarked, only working time of functions.
I was thinking of separating the whole stuff in this way: - a single unit test verifies the correctness of the algorithm comparing it with naive functions on generated values - the benchmark file isn't a test really, but could be used in CI to produce information about performance if it's possible (though I guess it's be possible) Also, IMO, there should be a tiny single header for _every_ function, since a large file with a ton of monotonic specializations is pretty hard to support.
- For char and short, testing every possible input is easy. You can do int, too, if you're willing to wait a bit for it to finish. I think it's too much to check every int32 value, because it takes really a lot of time for almost nothing. We can use many generated values with corner cases like the maximal/minimal value of each type. The idea is fine.
Or Boost.Integer. I explored this, it looks more appropriate to insert fresh code. I hope someone would take a look at PR if I make the code look really much better and send it.
-- Sent from: http://boost.2283326.n4.nabble.com/Boost-Dev-f2600599.html
On Tue, 28 Aug 2018 at 02:23, Evgeny Shulgin [Izaron] via Boost < boost@lists.boost.org> wrote:
- For unit tests, it's better to use a prng instead of random_device. non-deterministic tests make it more difficult to reproduce test failures. I'll change it, sounds good.
Yes, it sound good, but is it? I've been burned more than once by developing and testing a feature for something, thinking all was good and then remarkably stuff did not work in the real world. Ah, another seed would have caught this bug. I think some fuzzy testing is good, and you said it yourself (and the advice), the test has to be re-producible. Marrying the two is not impossible, but needs some thought, design and some more code. degski -- *“If something cannot go on forever, it will stop" - Herbert Stein* *“No, it isn’t truth. Truth isn’t truth" - Rudolph W. L. Giuliani*
Hi, Am 28.08.2018 06:12, schrieb degski via Boost:
On Tue, 28 Aug 2018 at 02:23, Evgeny Shulgin [Izaron] via Boost < boost@lists.boost.org> wrote:
- For unit tests, it's better to use a prng instead of random_device. non-deterministic tests make it more difficult to reproduce test failures. I'll change it, sounds good.
Yes, it sound good, but is it? I've been burned more than once by developing and testing a feature for something, thinking all was good and then remarkably stuff did not work in the real world. Ah, another seed would have caught this bug. I think some fuzzy testing is good, and you said it yourself (and the advice), the test has to be re-producible. Marrying the two is not impossible, but needs some thought, design and some more code.
Unit Tests are not the best place for fuzzing. You should do both. Unit test should be deterministic and always reproduce the same result, because you'll use them for debugging. Fuzzing on the other hand is used to find unknown bugs. When you found one, create a unit test to reproduce it, fix the bug, so that the unit test doesn't fail any more and continue fuzzing. Christof
On Tue, 28 Aug 2018 at 16:02, Christof Donat via Boost < boost@lists.boost.org> wrote:
Unit Tests are not the best place for fuzzing. You should do both. Unit test should be deterministic and always reproduce the same result, because you'll use them for debugging. Fuzzing on the other hand is used to find unknown bugs. When you found one, create a unit test to reproduce it, fix the bug, so that the unit test doesn't fail any more and continue fuzzing.
Thanks for not discarding it completely. How to do this, I think, I left in the middle, I just said more thought, more design and more code. degski -- *“If something cannot go on forever, it will stop" - Herbert Stein* *“No, it isn’t truth. Truth isn’t truth" - Rudolph W. L. Giuliani*
BOOST_ASSERT(naive_outcome == fast_outcome); Use this: if (naive_outcome != fast_outcome) { // Or std::cerr or printf std::cout << "LOOK AT ME: func_name failed on " << number << " real outcome " << naive_outcome << " library outcome " << fast_outcome << std::endl; BOOST_ASSERT(naive_outcome == fast_outcome); } CI will run this on different machines using randomness, and it is more
Then, if we don't want to fix the values once and for all, we can still use generating of 3-5mln of truly random values for each large integral type (still with corner cases), but when comparing naive functions (that are 100% working) and fast ones (from the library), we can do something like this: Instead of: likely to catch a bug, looking to its job log. -- Sent from: http://boost.2283326.n4.nabble.com/Boost-Dev-f2600599.html
participants (5)
-
Christof Donat
-
degski
-
Evgeny Shulgin
-
Evgeny Shulgin [Izaron]
-
Steven Watanabe