New subject: [proposed][histogram]

12 Apr 2017

      On 04/12/2017 11:37 AM, Hans Dembinski via Boost wrote:
...
The library implements a histogram class (a highly configurable policy-based template) for C++ and Python in C++11 code. Histograms are a standard tool to explore Big Data. They allow one to visualise and analyse distributions of random variables. A histogram provides a lossy compression of input data. GBytes of input can be put in a compact form which requires only a small fraction of the original memory. This makes histograms convenient for interactive data analysis and further processing.
Given that the compression is lossy, I am wondering how it compares with
a distribution estimator like:

   https://arxiv.org/abs/1507.05073v2

A common use-case when collecting numerical data is to determine the
quantiles. Boost.Accumulators contains an estimator (extended_p_square)
for that.

The advantage of such estimators are that they execute in constant time
and with constant memory usage, where the constant depends only on the
required precision.

PS: I am aware that this is a non-trivial question, so I do not expect
     an answer.

Re: [boost] [proposed][histogram]

Bjorn Reese

Oswin Krause

Hans Dembinski

Bjorn Reese

Hans Dembinski

tags

participants (3)