Re: [boost] Interest in some Algorithms

25 Jan 2016

      So sorry for the delay on the questions here. Some answers inline below.

On Wed, Jan 20, 2016 at 4:31 AM Phil Endecott <
spam_from_boost_dev@chezphil.org> wrote:
...
Steven Watanabe wrote:
...
On 01/18/2016 05:13 AM, Dean Michael Berris wrote:
...
I wanted to ask whether anybody's interested in a couple of
algorithms, to be made part of the Boost.Algorithm library.
One is tentatively named ?set_inplace_difference' which has
the following synopsis:
template <class I1, class I2>
I1 set_inplace_difference(I1 f1, I1 l1, I2 f2, I2 l2);
template <class I1, class I2, class C>
I1 set_inplace_difference(I1 f1, I1 l1, I2 f2, I2 l2, C comp);
(I've fixed an obvious typo above...)
...
...
It's essentially a set_difference which re-uses the storage of
the range [f1, l1). It takes elements from [f1, l1) that are in
[f2, l2) and moves them to [p, l1) where p is the partition point
(or the new 'end'). This is best used in conjunction with 'erase'
on vectors and/or std::deque. These algorithms return p.
Another is tentatively named 'set_inplace_intersection' which
instead of the above which does the difference, does an intersection
instead.
Requirements on I1 and I2 are that they are RandomAccess iterators,
and that they are sorted. I haven't figured out whether
partially-sorted is a sufficient condition. I also haven't figured
out whether it would work for weaker iterator classes, but so far
my implementation relies on std::lower_bound and std::upper_bound.
Why do you need lower/upper_bound?  set_difference
can be implemented easily with a straight linear
scan through the two sequences.
That is true, and I suppose that's acceptable for some values of N. In
practice, when you have two sorted ranges, you want to be able to eliminate
as many of the elements as you can without having to walk the whole range.
...
If you do that, the complexity is at least O(N) even for the best
case where the ranges don't overlap.
Here is my attempt at this algorithm, which might be O(N log N) worst
case if I'm thinking straight.  It is O(1) when the ranges don't overlap.
Dean, is this anything like what you've implemented?
It's close, I've implemented this without the recursion.
...
template <typename I1, typename I2>
I1 set_inplace_difference(I1 f1, I1 l1, I2 f2, I2 l2)
{
  // If either range is empty, there is nothing to do:
  if (f1 == l1 || f2 == l2) return l1;
// If the ranges are both singletons, compare the values:
  if (l1-f1 == 1 && l2-f2 == 1) {
    if (*f1 == *f2) return f1; else return l1;
  }
// If ranges don't overlap, there is nothing to do:
  if (*(l2-1) < *f1 || *f2 > *(l1-1)) return l1;
If you can use std::lower_bound and std::upper_bound here, there's a
simpler test. :)
...
// Recurse, splitting the larger range:
  if (l2-f2 > l1-f1) {
    // Second range is larger.  Split it at its midpoint and
    // subtract each half in turn.
    I2 m2 = f2 + (l2-f2)/2;
    I1 p = set_inplace_difference(f1,l1, m2,l2);
    return set_inplace_difference(f1,p,  f2,m2);
} else {
    // First range is larger.  Split it at its midpoint and
    // subtract from each half in turn.
    I1 m1 = f1 + (l1-f1)/2;
    I1 p = set_inplace_difference(f1,m1, f2,l2);
    I1 q = set_inplace_difference(m1,l1, f2,l2);
    // Now merge the two results.
    return std::rotate(p,m1,q);
  }
}
This looks very interesting Phil. I suppose if we measure this on really
big overlapping ranges, there might be an issue with the recursion. You can
leverage the fact that the ranges are already sorted, using
std::lower_bound and std::upper_bound to find contiguous ranges that can be
rotated out of the way incrementally.

Seems there is some interest for this, Marshall do I just send you a pull
request for the implementation? What is the process here?

Cheers