Re: [boost] [convert] Performance

10 Jun 2014


      Joel, thank you for the pointers. Much appreciated. Black art, you say? 
Any blacker-than-black adjectives available that I might use?

On 06/11/2014 09:03 AM, Joel de Guzman wrote:
...
On 6/11/14, 4:55 AM, Andrey Semashev wrote:
...
On Wednesday 11 June 2014 06:46:53 Vladimir Batov wrote:
...
And indeed BOOST_ASSERT seems to be heavier than BOOST_TEST due to
expression-validity check done with
__builtin_expect(expr, 1)
It's not a validity check, it's a hint to the compiler to help branch
prediction. Assertion failures are assumed to be improbable.
In any case, when testing performance you should be building in 
release mode,
where all asserts are removed.
Benchmarks are a black art. See how we do our performance tests in 
Spirit:
https://github.com/boostorg/spirit/blob/master/workbench/qi/int_parser.cpp
You can use our benchmark facility where all the black art is contained:
https://github.com/boostorg/spirit/blob/master/workbench/measure.hpp
using this strategy:
// Strategy: because the sum in an accumulator after each call
        // depends on the previous value of the sum, the CPU's pipeline
        // might be stalled while waiting for the previous addition to
        // complete.  Therefore, we allocate an array of accumulators,
        // and update them in sequence, so that there's no dependency
        // between adjacent addition operations.
        //
        // Additionally, if there were only one accumulator, the
        // compiler or CPU might decide to update the value in a
        // register rather that writing it back to memory.  we want each
        // operation to at least update the L1 cache.  *** Note: This
        // concern is specific to the particular application at which
        // we're targeting the test. ***
// This has to be at least as large as the number of
        // simultaneous accumulations that can be executing in the
        // compiler pipeline.  A safe number here is larger than the
        // machine's maximum pipeline depth. If you want to test the L2
        // or L3 cache, or main memory, you can increase the size of
        // this array.  1024 is an upper limit on the pipeline depth of
        // current vector machines.
A naive test implementation will give you *funny* results, depending
on the machine you are running on.
HTH.
Regards,