[Math / Distributions] How do mean, variance, etc end up in scope?
This is probably more of a C++ question than a Boost/Math question. I
think the answer comes down to a (surprising?) Boost namespace
convention.
In a program like this:
```
#include
Hi Warren, On Mon, 2 May 2022, 03:29 Warren Weckesser via Boost-users, < boost-users@lists.boost.org> wrote:
In a program like this:
``` #include
using boost::math::beta_distribution;
int main() { float a = 1.5f; float b = 3.0f; beta_distribution<float> dist(a, b);
auto m = mean(dist); } ```
how does the name 'mean' end up in scope in 'main()'?
this is indeed a C++ question: what you are observing is called "argument dependent lookup". Since `dist` is of a type defined in the `boost::math::beta_distribution`, the compiler will look up `mean` in the same namespace, and finds a definition there. If you try using `mean` with a different argument the full namespace should be needed: ``` std::vector<double> v{1,2,3,4,5}; double mu = boost::math::statistics::mean(v); ``` For more information you can check https://en.cppreference.com/w/cpp/language/adl . Hope this helps, .Andrea
On 5/2/22, Andrea Bocci
Hi Warren,
On Mon, 2 May 2022, 03:29 Warren Weckesser via Boost-users, < boost-users@lists.boost.org> wrote:
In a program like this:
``` #include
using boost::math::beta_distribution;
int main() { float a = 1.5f; float b = 3.0f; beta_distribution<float> dist(a, b);
auto m = mean(dist); } ```
how does the name 'mean' end up in scope in 'main()'?
this is indeed a C++ question: what you are observing is called "argument dependent lookup".
For more information you can check https://en.cppreference.com/w/cpp/language/adl .
Hope this helps,
Indeed it does! Thanks, Andrea. Warren
I'm working on a probability distribution library myself (ProDist) I've got an initial set of 1D distributions implemented and I'm working on Copulas now (A copula is the interdependency structure between dimensions of a multivariate continuous distribution). Discrete and mixed Discrete and continuous dimensions will follow.
I made distributions into polymorphic objects, and the classes can auto-best fit instances of themselves to data via the minimum chi squared method (maximum likelihood tends to over fit, maximum entropy tends to under fit, minimum chi squared is somewhere in the middle, although I'll also need maximum likelihood for when there aren't enough points to do minimum chi squared). The point is that functions like mean, variance, std dev, skewness, (excess) kurtosis, entropy, pdf, cdf, inverse cdf, draw (including vector and matrix valued of the 4 previous) central and non-central moments, etc are member function on the object, so the corresponding call would be
double a(1.5), b(3.0);
prodist::cont::dist1d::Beta(a,b) beta; //cont means continuous, dist1d distinguishes from copula and distnd directory/namespaces.
double mean(beta.mean());
which I think is more natural than the boost syntax
also having distributions be objects allows for non-parametric distributions like point sets, histograms, kernel density estimates (not yet implemented, although I have a automagically binned histogram class whose CDF in enhanced by a PCHIP which is comparable to ISJ KDE's in terms of accuracy while being a lot faster and enabling piecewise analytical evaluation of all member functions) and is consistent with how we need to handle multidimensional distributions.
I'm using the armadillo linear algebra library as part of the interface
Since all continuous 1d distributions provide moments of arbitrary order, that enables "analytical" gram-schmidt orthonormalization to get ortho normal polynomials of arbitrary order, which you can use an armadillo roots (eigen solve) on to get all roots which permits polynomial chaos/stochastic collocation, and you get that for free with every new continuous 1d distribution you implement.
Admittedly I have implemented a rather limited set of continuous1d distributions to date, but the non-parametric distributions take up the slack until I get around to doing more, FFT (wrapping FFTW3) based convolutions of PDFs is another TODO.
You can wrap your favorite RNG in a subclass of prodist::cont::dist1d::U01Base, "all" (well not dirac delta which is completely deterministic, and null which is basically an error code) other distributions (including discrete and multidimensional) take a shared_ptr to a U01base subclass (possibly the default object) but for efficient parallel execution you can wrap say random123 (a library of counter based rng's) in U01Base subclasses and give a different instance to individual distributions to avoid having every thread waiting on the same locking function for it's turn to get a draw. RandomStream is another good pseudo random number generator library you can wrap in a subclass of U01Base if you want to guarantee repeatability in parallel applications.
But like I hinted at above, you can call draw() on all distributions to use them like RNGs that produce points (1D or multiD) that obey any concrete distribution (including conditional, i.e. specifying a subset of dimensions, draws in the multidimensional case).
The point of all that is I think (and am betting on) treating distributions as first class polymorphic objects with associated member functions provide numerous advantages over the boost way of doing things. I know that ProDist will never be part of boost because it uses armadillo (and is too high level) but it may be advantageous to rethink the boost way of doing things.
-----Original Message-----
From: Boost-users
participants (3)
-
Andrea Bocci
-
Dalbey, Keith
-
Warren Weckesser