Math/Statistical Distributions
Hi, Statistics have been a while ago... I was reading the documentation, at least part of it, of the Boost library Math/Statistical Distributions. But I think I need a bit of help... I have many values Yi that are closely distributed around a mean value Y-mean. The collection of Yi is obtained by an algorithm that sometimes produces trash values that need to be discarded. Such false values are typically not "close" to the Y-mean. I was hoping somebody could tell what distribution I should use and how I should use this. If I recall correctly from my school-days, I need the Normal (Gaussian) Distribution. What would be the algorithm to find those trash values? Thank you, Andrej
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users- bounces@lists.boost.org] On Behalf Of Andrej van der Zee Sent: Wednesday, March 23, 2011 8:24 AM To: Boost-users@lists.boost.org Subject: [Boost-users] Math/Statistical Distributions
Hi,
Statistics have been a while ago... I was reading the documentation, at least part of it, of the Boost library Math/Statistical Distributions.
But I think I need a bit of help...
I have many values Yi that are closely distributed around a mean value Y-mean. The collection of Yi is obtained by an algorithm that sometimes produces
You should always read ALL the documentation ;-) trash
values that need to be discarded. Such false values are typically not "close" to the Y-mean.
I was hoping somebody could tell what distribution I should use and how I should use this. If I recall correctly from my school-days, I need the Normal (Gaussian) Distribution. What would be the algorithm to find those trash values?
I think what you possibly want *first* is an outlier detection (and discarder?) http://en.wikipedia.org/wiki/Outlier is a typical intro. There are distributions (in Boost.Math) that have 'longer tails' - that is they include what a 'normal' based outlier would throw out. But if you are trying to predict some 'standard deviation' like measure of scatter, this probably isn't what you want. If you just wanted to *display* the scatter a boxplot is useful - outliers are displayed too. (A Google Summer of Code project in 2007 produced a working display system in C++ that you might find useful. An example is attached. https://svn.boost.org/svn/boost/sandbox/SOC/2007/visualization/libs/svg_plot /doc/html/index.html will give you the manual with some examples. The code is nearby.) HTH Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com
Hi,
http://en.wikipedia.org/wiki/Outlier is a typical intro.
Thanks, that was very helpful! Cheers, Andrej
participants (2)
-
Andrej van der Zee
-
Paul A. Bristow