Re: [boost] [histogram] Variance

17 Sep 2018

      AMDG

On 09/17/2018 02:08 PM, Bjorn Reese via Boost wrote:
...
The variance of individual bins can be obtained when using the
adaptive_storage (via h.at(i).variance().)
I am trying to understand the overhead of this feature.
If I interpret the code correctly, there is a space overhead because
each counter has to keep track of both the count and the sum of squares.
The computational overhead is that the sum of squares has to be
calculated for each insertion. Is this correct?
It's only tracked if you use weights.
...
If so, is there any way to use the adaptive storage policy without
variance?
Furthermore, why does variance() return the sum of squares? Should this
not be divided by the sample size?
You're thinking of the formula
variance = \sum (x_i - mean)^2 / count = \sum x_i^2/count - mean^2
That formula doesn't apply in this case, since the variance
is the variance of the bin count, not the variance of the
weights.  The estimate for the variance is described here:
http://hdembinski.github.io/histogram/doc/html/histogram/rationale.html#hist...

In Christ,
Steven Watanabe