Hello Boost Some time ago, I needed to compute the exact median for a stream of elements and after using Boost.Accumulators, realized that it only provides an approximation of the median and not the exact number. I rolled up my custom implementation of the algorithm and used it for my project. I wanted to check if people on this list might be interested in such an implementation for Boost.Accumulators in which case I can try to adapt my implementation and send a PR. Performance characteristics: For a stream of n numbers, the memory required is O(n). Median can be queried at any point of time in O(1) and adding all the elements costs O(nlog(n)) Lakshay
On Sat, Sep 22, 2018 at 9:22 PM Lakshay Garg via Boost < boost@lists.boost.org> wrote:
Hello Boost
Some time ago, I needed to compute the exact median for a stream of elements and after using Boost.Accumulators, realized that it only provides an approximation of the median and not the exact number.
I rolled up my custom implementation of the algorithm and used it for my project. I wanted to check if people on this list might be interested in such an implementation for Boost.Accumulators in which case I can try to adapt my implementation and send a PR.
Performance characteristics: For a stream of n numbers, the memory required is O(n). Median can be queried at any point of time in O(1) and adding all the elements costs O(nlog(n))
Could you compare and contrast this with the standard library's way of computing a median, std::nth_element()? Yout just give it N/2 for 'n', and it computes the median. Zach
On Sun, 23 Sep 2018 at 08:39, Zach Laine via Boost
On Sat, Sep 22, 2018 at 9:22 PM Lakshay Garg via Boost < boost@lists.boost.org> wrote:
I rolled up my custom implementation of the algorithm and used it for my project. I wanted to check if people on this list might be interested in such an implementation for Boost.Accumulators in which case I can try to adapt my implementation and send a PR.
Performance characteristics: For a stream of n numbers, the memory required is O(n). Median can be queried at any point of time in O(1) and adding all the elements costs O(nlog(n))
Could you compare and contrast this with the standard library's way of computing a median, std::nth_element()? Yout just give it N/2 for 'n', and it computes the median.
The reason for not using std::nth_element was because I wanted to be able to query the median at any point of the sequence very efficiently. Had I used std::nth_element, I would have needed O(n) time for obtaining the median. For a case where I need to compute median after every single element, this would have required O(n^2) time for a sequence of length n. Here is my implementation in case you would like to have a look: https://gist.github.com/lakshayg/d27f29b4634cbfea3fc48c21b7b608ef
On Sat, 22 Sep 2018 at 19:21, Lakshay Garg
I rolled up my custom implementation of the algorithm and used it for my project. I wanted to check if people on this list might be interested in such an implementation for Boost.Accumulators in which case I can try to adapt my implementation and send a PR.
Performance characteristics: For a stream of n numbers, the memory required is O(n). Median can be queried at any point of time in O(1) and adding all the elements costs O(nlog(n))
I looked up the problem of computing median of a stream on SO and found a few more approaches which are efficient in a rolling window scenario. https://stackoverflow.com/questions/10657503/find-running-median-from-a-stre... I can try and implement one of those if we decide to have this in the accumulators library.
On 09/23/18 22:03, Lakshay Garg via Boost wrote:
I can try and implement one of those if we decide to have this in the accumulators library.
It would be useful if this class can handle other quantiles as well. The quantiles could be specified as template parameters using std::ratio in a similar way as the following P^2 estimator does: https://github.com/breese/trial.online/blob/develop/include/trial/online/qua...
On Sun, 30 Sep 2018 at 08:37, Bjorn Reese via Boost
On 09/23/18 22:03, Lakshay Garg via Boost wrote:
I can try and implement one of those if we decide to have this in the accumulators library.
It would be useful if this class can handle other quantiles as well.
The quantiles could be specified as template parameters using std::ratio in a similar way as the following P^2 estimator does:
Interesting suggestion. Will give it a shot this week, shouldn't be too hard.
participants (3)
-
Bjorn Reese
-
Lakshay Garg
-
Zach Laine