So sorry for the delay on the questions here. Some answers inline below. On Wed, Jan 20, 2016 at 4:31 AM Phil Endecott < spam_from_boost_dev@chezphil.org> wrote:
Steven Watanabe wrote:
On 01/18/2016 05:13 AM, Dean Michael Berris wrote:
I wanted to ask whether anybody's interested in a couple of algorithms, to be made part of the Boost.Algorithm library.
One is tentatively named ?set_inplace_difference' which has the following synopsis:
template
I1 set_inplace_difference(I1 f1, I1 l1, I2 f2, I2 l2); template
I1 set_inplace_difference(I1 f1, I1 l1, I2 f2, I2 l2, C comp); (I've fixed an obvious typo above...)
It's essentially a set_difference which re-uses the storage of the range [f1, l1). It takes elements from [f1, l1) that are in [f2, l2) and moves them to [p, l1) where p is the partition point (or the new 'end'). This is best used in conjunction with 'erase' on vectors and/or std::deque. These algorithms return p.
Another is tentatively named 'set_inplace_intersection' which instead of the above which does the difference, does an intersection instead.
Requirements on I1 and I2 are that they are RandomAccess iterators, and that they are sorted. I haven't figured out whether partially-sorted is a sufficient condition. I also haven't figured out whether it would work for weaker iterator classes, but so far my implementation relies on std::lower_bound and std::upper_bound.
Why do you need lower/upper_bound? set_difference can be implemented easily with a straight linear scan through the two sequences.
That is true, and I suppose that's acceptable for some values of N. In practice, when you have two sorted ranges, you want to be able to eliminate as many of the elements as you can without having to walk the whole range.
If you do that, the complexity is at least O(N) even for the best case where the ranges don't overlap.
Here is my attempt at this algorithm, which might be O(N log N) worst case if I'm thinking straight. It is O(1) when the ranges don't overlap. Dean, is this anything like what you've implemented?
It's close, I've implemented this without the recursion.
template
I1 set_inplace_difference(I1 f1, I1 l1, I2 f2, I2 l2) { // If either range is empty, there is nothing to do: if (f1 == l1 || f2 == l2) return l1; // If the ranges are both singletons, compare the values: if (l1-f1 == 1 && l2-f2 == 1) { if (*f1 == *f2) return f1; else return l1; }
// If ranges don't overlap, there is nothing to do: if (*(l2-1) < *f1 || *f2 > *(l1-1)) return l1;
If you can use std::lower_bound and std::upper_bound here, there's a simpler test. :)
// Recurse, splitting the larger range: if (l2-f2 > l1-f1) { // Second range is larger. Split it at its midpoint and // subtract each half in turn. I2 m2 = f2 + (l2-f2)/2; I1 p = set_inplace_difference(f1,l1, m2,l2); return set_inplace_difference(f1,p, f2,m2);
} else { // First range is larger. Split it at its midpoint and // subtract from each half in turn. I1 m1 = f1 + (l1-f1)/2; I1 p = set_inplace_difference(f1,m1, f2,l2); I1 q = set_inplace_difference(m1,l1, f2,l2); // Now merge the two results. return std::rotate(p,m1,q); } }
This looks very interesting Phil. I suppose if we measure this on really big overlapping ranges, there might be an issue with the recursion. You can leverage the fact that the ranges are already sorted, using std::lower_bound and std::upper_bound to find contiguous ranges that can be rotated out of the way incrementally. Seems there is some interest for this, Marshall do I just send you a pull request for the implementation? What is the process here? Cheers