Efficient removal of std::list items
This seems like the type of thing boost may have an answer to, yet I still havn't found it: I have two std::lists, about 50,000 items in each. The items are all number ranges so two uints start and end. I need to remove one list from the other - exact matches or subranges only. Currently I'm using remove_if() to remove/shrink/split them but it is taking forever, around 30sec. Anyone know of a better solution? -- Cory Nelson http://www.int64.org
At Sunday 2004-12-19 17:54, you wrote:
This seems like the type of thing boost may have an answer to, yet I still havn't found it:
I have two std::lists, about 50,000 items in each. The items are all number ranges so two uints start and end. I need to remove one list from the other - exact matches or subranges only. Currently I'm using remove_if() to remove/shrink/split them but it is taking forever, around 30sec.
Anyone know of a better solution?
are the lists sorted?
-- Cory Nelson http://www.int64.org _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Victor A. Wagner Jr. http://rudbek.com The five most dangerous words in the English language: "There oughta be a law"
Yep. If you're going where I think you're going (binary search), I
was about halfway through the modifications when I opened the email.
Lack of sleep will get to you... :(
On Sun, 19 Dec 2004 20:22:01 -0700, Victor A. Wagner Jr.
At Sunday 2004-12-19 17:54, you wrote:
This seems like the type of thing boost may have an answer to, yet I still havn't found it:
I have two std::lists, about 50,000 items in each. The items are all number ranges so two uints start and end. I need to remove one list from the other - exact matches or subranges only. Currently I'm using remove_if() to remove/shrink/split them but it is taking forever, around 30sec.
Anyone know of a better solution?
are the lists sorted?
-- Cory Nelson http://www.int64.org _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Victor A. Wagner Jr. http://rudbek.com The five most dangerous words in the English language: "There oughta be a law"
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- Cory Nelson http://www.int64.org
Cory Nelson wrote:
Yep. If you're going where I think you're going (binary search), I was about halfway through the modifications when I opened the email. Lack of sleep will get to you... :(
On Sun, 19 Dec 2004 20:22:01 -0700, Victor A. Wagner Jr.
wrote: At Sunday 2004-12-19 17:54, you wrote:
This seems like the type of thing boost may have an answer to, yet I still havn't found it:
I have two std::lists, about 50,000 items in each. The items are all number ranges so two uints start and end. I need to remove one list from the other - exact matches or subranges only. Currently I'm using remove_if() to remove/shrink/split them but it is taking forever, around 30sec.
Anyone know of a better solution?
are the lists sorted?
That'll only work if the ranges are non-overlapping. Otherwise you'll need to use an interval tree to avoid an O(N) search for each interval you want to remove. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com
At Sunday 2004-12-19 21:16, you wrote:
Cory Nelson wrote:
Yep. If you're going where I think you're going (binary search), I was about halfway through the modifications when I opened the email. Lack of sleep will get to you... :(
On Sun, 19 Dec 2004 20:22:01 -0700, Victor A. Wagner Jr.
wrote: At Sunday 2004-12-19 17:54, you wrote:
This seems like the type of thing boost may have an answer to, yet I still havn't found it:
I have two std::lists, about 50,000 items in each. The items are all number ranges so two uints start and end. I need to remove one list from the other - exact matches or subranges only. Currently I'm using remove_if() to remove/shrink/split them but it is taking forever, around 30sec.
Anyone know of a better solution?
are the lists sorted?
That'll only work if the ranges are non-overlapping. Otherwise you'll need to use an interval tree to avoid an O(N) search for each interval you want to remove.
you're likely correct, but we were looking at handling some stuff with imprecise (error probabilities) results and we were on the track of doing some stuff along these lines. The project was cancelled (shame, I think it would fit well with the physical quantities systems... you just can't measure things as closely as you'd like) so we never got to see how/if all of the algorithms would work or whether we'd have to specialize them for "fuzzy" numbers. I think it's worth pursuing for ranges tho.
-- Dave Abrahams Boost Consulting http://www.boost-consulting.com
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Victor A. Wagner Jr. http://rudbek.com The five most dangerous words in the English language: "There oughta be a law"
Victor A. Wagner Jr. wrote:
are the lists sorted?
Cory Nelson wrote:
Yep. If you're going where I think you're going (binary search), I was about halfway through the modifications when I opened the email. Lack of sleep will get to you... :(
Dave Abrahams wrote:
That'll only work if the ranges are non-overlapping. Otherwise you'll need to use an interval tree to avoid an O(N) search for each interval you want to remove.
Victor A. Wagner Jr. Replied:
you're likely correct, but we were looking at handling some stuff with imprecise (error probabilities) results and we were on the track of doing some stuff along these lines. The project was cancelled (shame, I think it would fit well with the physical quantities systems... you just can't measure things as closely as you'd like) so we never got to see how/if all of the algorithms would work or whether we'd have to specialize them for "fuzzy" numbers.
Have you seen the Boost Interval library?
I think it's worth pursuing for ranges tho.
Why should the OP pursue anything that's likely (your words) to be O(N^2) when there's a known and proven O(N log N) approach? -- Dave Abrahams Boost Consulting http://www.boost-consulting.com
At Monday 2004-12-20 05:42, you wrote:
Victor A. Wagner Jr. wrote:
are the lists sorted?
Cory Nelson wrote:
Yep. If you're going where I think you're going (binary search), I was about halfway through the modifications when I opened the email. Lack of sleep will get to you... :(
Dave Abrahams wrote:
That'll only work if the ranges are non-overlapping. Otherwise you'll need to use an interval tree to avoid an O(N) search for each interval you want to remove.
you're likely correct, but we were looking at handling some stuff with imprecise (error probabilities) results and we were on the track of doing some stuff along these lines. The project was cancelled (shame, I
Victor A. Wagner Jr. Replied: think it
would fit well with the physical quantities systems... you just can't measure things as closely as you'd like) so we never got to see how/if all of the algorithms would work or whether we'd have to specialize them for "fuzzy" numbers.
Have you seen the Boost Interval library?
I think it's worth pursuing for ranges tho.
Why should the OP pursue anything that's likely (your words) to be O(N^2) when there's a known and proven O(N log N) approach?
you know dave, sometimes I think you just like to dis me. you sure as hell don't pay attention to anything I actually write. I never suggested anyone pursue ANY O(N^2) approach. I was SUGGESTING that maybe there is an O(N+M) approach although allowing that perhaps YOU are correct and there isn't. btw, "these lines" referred to my previous message about modifying set_difference() not your silly O(n) search for every interval you want to remove.
-- Dave Abrahams Boost Consulting http://www.boost-consulting.com
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Victor A. Wagner Jr. http://rudbek.com The five most dangerous words in the English language: "There oughta be a law"
Victor A. Wagner Jr. wrote:
At Monday 2004-12-20 05:42, you wrote:
Victor A. Wagner Jr. wrote:
are the lists sorted?
Cory Nelson wrote:
Yep. If you're going where I think you're going (binary search), I was about halfway through the modifications when I opened the email. Lack of sleep will get to you... :(
Dave Abrahams wrote:
That'll only work if the ranges are non-overlapping. Otherwise you'll need to use an interval tree to avoid an O(N) search for each interval you want to remove.
you're likely correct, but we were looking at handling some stuff with imprecise (error probabilities) results and we were on the track of doing some stuff along these lines. The project was cancelled (shame, I
Victor A. Wagner Jr. Replied: think it
would fit well with the physical quantities systems... you just can't measure things as closely as you'd like) so we never got to see how/if all of the algorithms would work or whether we'd have to specialize them for "fuzzy" numbers.
Have you seen the Boost Interval library?
By the way, have you?
I think it's worth pursuing for ranges tho.
Why should the OP pursue anything that's likely (your words) to be O(N^2) when there's a known and proven O(N log N) approach?
you know dave, sometimes I think you just like to dis me.
I don't. I am trying to protect someone who's asking for help from going down an unproductive path. I've been through this problem before. I know that sorted linear ranges don't work; there's just not enough information and structure there to make the job efficient. It was an honest question, you know. If you're going to suggest that someone pursue that path I think you ought to be able to explain why.
You sure as hell don't pay attention to anything I actually write.
I do. It appears from what you write below that if anything I paid too much attention to what you actually wrote and failed to perceive what you actually meant. Please; you are at least just as susceptible to communication error as I am. There's no need to make me out to be a villain here.
I never suggested anyone pursue ANY O(N^2) approach. I was SUGGESTING that maybe there is an O(N+M) approach
I don't know how anyone (including the OP) was supposed to understand that, considering there has been no mention of O(N+M) so far in this thread.
although allowing that perhaps YOU are correct and there isn't.
Here's the other part of the communication error. First of all, you said that I was "likely" to be correct, which is much stronger than "perhaps." And what I was claiming wasn't that there's no O(N+M) approach, but that binary search will not work (i.e. be efficient) unless the ranges are non-overlapping. Let me just point out that I *did* actually write that. Were you paying attention?
btw, "these lines" referred to my previous message about modifying set_difference() not your silly O(n) search for every interval you want to remove.
In the general case (unless the ranges are non-overlapping), using set_difference will also be a silly O(n) search for every interval that needs to be removed, because although you can avoid O(n) for each starting value, you still potentially need to look through all the rest of the elements until you find one that starts later than the current range's end value, which could be anything. -- Dave Abrahams Boost Consulting http://www.boost-consulting.com
At Sunday 2004-12-19 20:44, you wrote:
Yep. If you're going where I think you're going (binary search), I was about halfway through the modifications when I opened the email.
actually I hadn't considered binary search at all, I was thinking that perhaps a modification to set_difference() might do the trick. I don't mean a _small_ modification either, but if the lists are already sorted maybe there's a chance.
Lack of sleep will get to you... :(
On Sun, 19 Dec 2004 20:22:01 -0700, Victor A. Wagner Jr.
wrote: At Sunday 2004-12-19 17:54, you wrote:
This seems like the type of thing boost may have an answer to, yet I still havn't found it:
I have two std::lists, about 50,000 items in each. The items are all number ranges so two uints start and end. I need to remove one list from the other - exact matches or subranges only. Currently I'm using remove_if() to remove/shrink/split them but it is taking forever, around 30sec.
Anyone know of a better solution?
are the lists sorted?
-- Cory Nelson http://www.int64.org _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Victor A. Wagner Jr. http://rudbek.com The five most dangerous words in the English language: "There oughta be a law"
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- Cory Nelson http://www.int64.org _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Victor A. Wagner Jr. http://rudbek.com The five most dangerous words in the English language: "There oughta be a law"
Cory Nelson wrote:
This seems like the type of thing boost may have an answer to, yet I still havn't found it:
I have two std::lists, about 50,000 items in each. The items are all number ranges so two uints start and end. I need to remove one list from the other - exact matches or subranges only. Currently I'm using remove_if() to remove/shrink/split them but it is taking forever, around 30sec.
Anyone know of a better solution?
I suggest you use something known as an interval tree, which can identify overlapping ranges in O(log N) time. It essentially stores intervals sorted based on their start values and keeps a record in each node of the greatest end value in any of its children or grandchildren or great-grandchildren, etc. If you're industrious you can probably hack up your STL's rb tree implementation (the one it uses for the associative containers) to support the end value; that's what I did years ago when I needed such a thing. You just need to figure out how to maintain the "greatest end value in subtree" fields when the tree is rebalanced. At the time I remember that literature on interval trees was a little hard to find, but if that's still true, once you understand the data structure you should be able to figure out the O(log N) search algorithm. Good Luck, -- Dave Abrahams Boost Consulting http://www.boost-consulting.com
I need to remove one list from the other This sounds like a set operation. Can you use a set instead of a list ? If you can I think that set_difference will give you a better performance for that.
http://www.sgi.com/tech/stl/set.html Mauricio Gomes Pensar Digital phone: 55-11-4121-6287 mobile: 55-11-8319-9610 http://pensardigital.com On Dec 19, 2004, at 10:54 PM, Cory Nelson wrote:
This seems like the type of thing boost may have an answer to, yet I still havn't found it:
I have two std::lists, about 50,000 items in each. The items are all number ranges so two uints start and end. I need to remove one list from the other - exact matches or subranges only. Currently I'm using remove_if() to remove/shrink/split them but it is taking forever, around 30sec.
Anyone know of a better solution?
-- Cory Nelson http://www.int64.org _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
participants (4)
-
Cory Nelson
-
David Abrahams
-
Mauricio Gomes
-
Victor A. Wagner Jr.