Re: [Boost-users] [Polygon] Test among 3 libraries

newer
boost::asio::io_service destructor...

Angus Johnson

8 Sep 2010 8 Sep '10

1:44 p.m.

Simonson, Lucanus J <http://search.gmane.org/?author=Simonson%2C+Lucanus+J&sort=date> wrote: "I've never seen a comparison with clipper before, I've never heard of clipper and would like to learn more. Can you provide a link or reference? I found a sourceforge site with minimal description of clipper, but it left me with a lot of unanswered questions." Hi Luke. I'm the author of Clipper and am happy to answer your questions as best I can, but perhaps I can preempt some of them with a little background about Clipper and myself. Firstly, I've only recently created the Clipper library (ie within the last 3-4 months) and that may explain why you haven't heard of it. I originally wrote it in Delphi Pascal, but I've translated it into C++ too, and has been released with a Boost license. I wrote Clipper because I wanted a polygon clipping library (in Delphi) that wasn't encumbered with restrictive licensing. I'm still refining the library and have several updates in the pipeline (including polyline-polygon and polybezier-polygon clipping). Clipper is based on the Vatti clipping algorithm which places no restrictions on the types of input (subject and clipping) polygons. However, Vatti's original paper completely omits any discussion of 1. how to handle what I call 'complex' intersections - intersections involving more than 2 edges; and 2. how to handle overlapping horizontal edges. My Clipper library extends the Vatti algorithm by addressing both these issues. Clipper also extends the Vatti algorithm by implementing the NonZero polygon filling rule for input polygons (as well as Vatti's EvenOdd filling). I'm aware of one limitation of my Clipper library: because it uses floating point math it isn't 'robust' in the strict sense. However, errors (which are extremely rare) are properly trapped and reported as a failed clipping operation. In my testing ( see http://www.angusj.com/delphi/clipper.php#features ), Clipper has a number of advantages over GPC, a well known clipping library that's also based on Vatti's algorithm. Clipper - 1. is significantly faster; 2. properly handles complex intersections; 3. accepts both EvenOdd and NonZero filling input polygons; 4. doesn't require payment for commercial use. The SourceForge site for Clipper is: http://sourceforge.net/projects/polyclipping/ About myself: I've been writing computer code for 30yrs as a hobby. (I was a medical practitioner before I retired 15 yrs ago due to illness.) For the last 15 years I've been using Delphi (Object Pascal) and until now have never used C++. The C++ translation of my Delphi Clipper library is the first code I've written in C++ and should explain any unorthodoxy in my C++ coding style. (I'm aware that the C++ code is currently about 15% slower that the Delphi code so I suspect I'm not making the most of this language.) I'm very happy to receive constructive feedback and hope I can answer any questions too. Angus

Attachments:

attachment.html (text/html — 4.3 KB)

Show replies by date

Simonson, Lucanus J

8 Sep 8 Sep

6:52 p.m.

New subject: [Polygon] Test among 3 libraries

Angus wrote:

...

Simonson, Lucanus J <http://search.gmane.org/?author=Simonson%2C+Lucanus+J&sort=date> wrote:

...
"I've never seen a comparison with clipper before, I've never heard of clipper and would like to learn more. Can you provide a link or reference? I found a sourceforge site with minimal description of clipper, but it left me with a lot of unanswered questions."

...

Hi Luke. I'm the author of Clipper and am happy to answer your questions as best I can, but perhaps I can preempt some of them with a little background about Clipper and myself. Firstly, I've only recently created the Clipper library (ie within the last 3-4 months) and that may explain why you haven't heard of it. I originally wrote it in Delphi Pascal, but I've translated it into C++ too, and has been released with a Boost license. I wrote Clipper because I wanted a polygon clipping library (in Delphi) that wasn't encumbered with restrictive licensing. I'm still refining the library and have several updates in the pipeline (including polyline-polygon and polybezier-polygon clipping). Clipper is based on the Vatti clipping algorithm which places no restrictions on the types of input (subject and clipping) polygons. However, Vatti's original paper completely omits any discussion of 1. how to handle what I call 'complex' intersections - intersections involving more than 2 edges; and 2. how to handle overlapping horizontal edges. My Clipper library extends the Vatti algorithm by addressing both these issues. Clipper also extends the Vatti algorithm by implementing the NonZero polygon filling rule for input polygons (as well as Vatti's EvenOdd filling). I'm aware of one limitation of my Clipper library: because it uses floating point math it isn't 'robust' in the strict sense. However, errors (which are extremely rare) are properly trapped and reported as a failed clipping operation.

Vatti is the right algorithm to implement. GPC implements a sort of crippled version of Vatti that uses a suboptimal data structure for the sweepline. I expect that you are using a tree based sweepline data structre (as Vatti suggests) to implement the optimal sweepline algorithm. My own implementation is similar to Vatti in some ways, and I cite Vatti in my journal article. In addition to handling the high order intersection points and high order overlapping colinear line segments I also handle self intersecting and self overlapping polygons as well as multiple overlapping polygons in a input. I implement a Positive polygon filling rule, which is equivalent to the NonZero filling rule for typical cases, but for cases where there are holes that project outside of polygons or polygons with "bow-tie" self intersections it is different and more mathematically consistent than the NonZero or Odd filling rule. I take extreme care in my implementation to provide a 100% robust solution. This has forced me to take a divide and conquer approach to line segment intersection instead of sweepline. I then feed the fully intersected line segments into sweepline to compute the Positive filling rule and a second pass to form output polygons. The sweepline that forms polygons is substantially similar to Vatti's algorithm. I do extensive calculations to check the results of arithmetic are numerically trustworthy. The three passes instead of one, sub-optimality of divide and conquer and high constant factor cost of providing robust arithmetic all have performance drawbacks when compared to a straightforward implementation of Vatti. Also I limit my coordinate data type to integer (similar to polyboolean, but actually robust as opposed to purportedly robust). I would expect that your implementation could well be faster than mine at all scales of inputs except for the cases where yours fails due to numerical robustness problems. While you state that such errors are rare, the reality is that as the scale of the data being processed increases the probability of such an error occuring approaches 100%. In my testing non robust algorithms (such as polyboolean) have a limit in scale of data beyond which they are completely unreliable and effectively useless. This undermines the value of algorithmic optimality. Polyboolean is not optimal, by the way, and is comparable to GPC in this respect. Right now we have already yet another implementation of polygon clipping accepted to boost as part of the new Boost.Geometry (formerly GGL) library that is not numerically robust, and not algorithmically optimal. However, it has much better chances than an optimal Vatti sweepline implementation of successfully producing an output because sweepline is a somewhat tempermental algorithm and numerical errors are often fatal conditions, whereas the Boost.Geometry algorithm tolerates them better. My expectation is that for large cases your implementation should be faster than the Boost.Geometry algorithm, but more likely to fail due to numerical errors. Your algorithm could find a home in boost itself as a "strategy" for polygon clipping that can be passed into the Boost.Geometry generic interface. If that interests you I suggest you join the Boost.Geometry project and mailing list and pursue contributing your code. I expect that your participation would be welcomed by the other member of the project, certainly I welcome it. It is possible to make a floating point implementation of Vatti, such as yours, numerically robust and that is something we could pursue as a community in the Boost.Geometry project.

...

In my testing ( see http://www.angusj.com/delphi/clipper.php#features ), Clipper has a number of advantages over GPC, a well known clipping library that's also based on Vatti's algorithm. Clipper - 1. is significantly faster; 2. properly handles complex intersections; 3. accepts both EvenOdd and NonZero filling input polygons; 4. doesn't require payment for commercial use.

I like the looks of this page. It looks like you are doing what I would call the "right" kind of testing with randomly generated circles and polygons. I approve of beating GPC at its own game of intersecting Great Britain with a spiral. Bravo. GPC is substantially robust, however, not 100%, but darn close. While you are making your comparison you might want to mention that. The reason GPC is so reliable stems from its suboptimal sweepline data structure. The tree data structure recommended by Vatti and Bentley-Ottman has its invariant violated by numerical robustness errors while the list GPC maintains tolerates errors better.

...

The SourceForge site for Clipper is: http://sourceforge.net/projects/polyclipping/ About myself: I've been writing computer code for 30yrs as a hobby. (I was a medical practitioner before I retired 15 yrs ago due to illness.) For the last 15 years I've been using Delphi (Object Pascal) and until now have never used C++. The C++ translation of my Delphi Clipper library is the first code I've written in C++ and should explain any unorthodoxy in my C++ coding style. (I'm aware that the C++ code is currently about 15% slower that the Delphi code so I suspect I'm not making the most of this language.) I'm very happy to receive constructive feedback and hope I can answer any questions too.

I can review your code if you like. I expect that with 15 years of experience programming in any language your C++ will be reasonably well written in terms of procedural or object oriented style. Lack of generic programming style is not a fatal design flaw in my book. I can quickly identify performance problems due to C++ usage to help you get the most out of the langauge. We can add templates for coordinate data type and so forth to get it up to standards required by Boost.Geometry should you choose to pursue contributing your code. Please read the archives of the boost dev mailing list to learn more background on the geometry work being done in boost over the past several years and to find the link to get added to the Boost.Gometry (ggl) mailing list if you are interested. Thanks, Luke

Angus Johnson

11:03 p.m.

New subject: [Polygon] Test among 3 libraries

On 9/09/2010 4:52 AM, Simonson, Lucanus J wrote:

...

Your algorithm could find a home in boost itself as a "strategy" for polygon clipping that can be passed into the Boost.Geometry generic interface. If that interests you I suggest you join the Boost.Geometry project and mailing list and pursue contributing your code. I expect that your participation would be welcomed by the other member of the project, certainly I welcome it. It is possible to make a floating point implementation of Vatti, such as yours, numerically robust and that is something we could pursue as a community in the Boost.Geometry project.

Thanks Luke for your kind welcome. I wasn't hoping for Clipper to be integrated into the Boost project in any way, I just chose the Boost License for it's simplicity. I think that multiple solutions for tasks in a code library is generally confusing to users, though I can see that sometimes two is justified. However, I'm very open to some form of collaboration where the better ideas from each algorithm might be integrated into one solution, assuming that's possible.

...

I like the looks of this page. It looks like you are doing what I would call the "right" kind of testing with randomly generated circles and polygons. I approve of beating GPC at its own game of intersecting Great Britain with a spiral. Bravo. GPC is substantially robust, however, not 100%, but darn close. While you are making your comparison you might want to mention that. The reason GPC is so reliable stems from its suboptimal sweepline data structure. The tree data structure recommended by Vatti and Bentley-Ottman has its invariant violated by numerical robustness errors while the list GPC maintains tolerates errors better.

I believe Clipper is currently substantially more reliable that GPC because I don't think GPC properly deals with complex intersections. I'm not aware of the tree data structure recommended by Vatti et al. (I couldn't find any mention of it in Vatti's original paper.) My implementation uses linked and double linked list data structures which are described in Vatti's paper. I would be very interested if you could provide a link to Vatti's and Bentley-Ottman's discussion of a tree structured approach. Concerning robustness - I believe numerically robust problems only arise when complex intersections are encountered. Specifically, how does an algorithm cope with mulitple intersections occuring in very close proximity. Simply relying on 'tolerance' when using floating point values will guarantee failure at some point. On failure, intersection operations are performed in an incorrect order resulting in incorrect edge sorts. This fortunately is easily detected after each sweep. While, not a final solution, I have an update that's close to release that tests the edge sorts before actually processing the intersections. If the test sort order is wrong, the 'tolerance' value is adjusted (both up and down) and retested. Fortunately, this testing has only a minor impact on performance (%10). In my stress tests*, and using this 'tuned' tolerance approach, the incedence of test failure drops to about one in a million tests. (*The test involves generating random subject and clip polygons with 100 vertices each rotated at varying degrees where the vertices are specifically tuned to generate may complex intersections.) Given that complex intersections in clipping operations are infrequent events in normal use (which is why GPC rarely blinks in normal use), I would argue that the likelihood of users encountering a failure with Clipper would be very much lower than one in a million. You also mentioned scaling as an issue with floating point implementations. I agree that it is an issue and that input polygons with very small or very large coordinate values should be scaled. Anyhow, a fully numerically robust solution is still my final goal.

...

I can review your code if you like. I expect that with 15 years of experience programming in any language your C++ will be reasonably well written in terms of procedural or object oriented style. Lack of generic programming style is not a fatal design flaw in my book. I can quickly identify performance problems due to C++ usage to help you get the most out of the langauge. We can add templates for coordinate data type and so forth to get it up to standards required by Boost.Geometry should you choose to pursue contributing your code.

I would certainly welcome your review of my code. However, I'm not seeking to have it accepted into Boost (for the reason I mentioned above) unless others feel strongly that this is waranted.

...

Please read the archives of the boost dev mailing list to learn more background on the geometry work being done in boost over the past several years and to find the link to get added to the Boost.Gometry (ggl) mailing list if you are interested.

I'll do that. Thanks again for your welcome and encouraging feedback. Angus

Simonson, Lucanus J

11:32 p.m.

New subject: [Polygon] Test among 3 libraries

Angus Johnson wrote:

...

I believe Clipper is currently substantially more reliable that GPC because I don't think GPC properly deals with complex intersections. I'm not aware of the tree data structure recommended by Vatti et al. (I couldn't find any mention of it in Vatti's original paper.) My implementation uses linked and double linked list data structures which are described in Vatti's paper. I would be very interested if you could provide a link to Vatti's and Bentley-Ottman's discussion of a tree structured approach.

Concerning robustness - I believe numerically robust problems only arise when complex intersections are encountered. Specifically, how does an algorithm cope with mulitple intersections occuring in very close proximity. Simply relying on 'tolerance' when using floating point values will guarantee failure at some point. On failure, intersection operations are performed in an incorrect order resulting in incorrect edge sorts. This fortunately is easily detected after each sweep. While, not a final solution, I have an update that's close to release that tests the edge sorts before actually processing the intersections. If the test sort order is wrong, the 'tolerance' value is adjusted (both up and down) and retested. Fortunately, this testing has only a minor impact on performance (%10). In my stress tests*, and using this 'tuned' tolerance approach, the incedence of test failure drops to about one in a million tests. (*The test involves generating random subject and clip polygons with 100 vertices each rotated at varying degrees where the vertices are specifically tuned to generate may complex intersections.) Given that complex intersections in clipping operations are infrequent events in normal use (which is why GPC rarely blinks in normal use), I would argue that the likelihood of users encountering a failure with Clipper would be very much lower than one in a million. You also mentioned scaling as an issue with floating point implementations. I agree that it is an issue and that input polygons with very small or very large coordinate values should be scaled. Anyhow, a fully numerically robust solution is still my final goal.

That's a worthwhile goal to be sure. When I discussed scalability I meant algorithmic complexity and ability to process large data volumes in input geometry in a single test case. An algorithm that scales near linearly will be able to process inputs on the order of one hundred million verticies in reasonable run time, for example. If your sweepline data structure is a linked list and not a tree or bucketed list then your algorithmic complexity must be worse than the O(n log n) wrt input verticies plus intersection vertices described in Vatti's paper. In that case I wouldn't expect it to scale past tens of millions of vertices in reasonable runtime. In my testing the empirical scaling factor for GPC was about O(n^1.75) so you could be substantially faster than that but still worse than O(n log n). While you mention that odds of failure may be one in a million tests for tests of a given size, if you increase the size of the input by a factor of one million your odds of failure get close to one in one. That was the point I was trying to make about algorithmic scalability being undermined by lack of robustness. It doesn't matter if your implementation could process one hundred million vertex test cases in reasonable runtime if the odds of it succeeding are remote. I just wanted to clarify this point because it isn't always intuitively obvious. Also, I believe GPC does handle high order intersections because I'm quite certain I was generating large numbers of them in the tests I was running that GPC seemed to handle just fine. I could be wrong of course, but I suggest you double check that GPC doesn't properly deal with complex intersection. I know it doesn't deal with overlapping or self intersecting polygons at all, but that is a separate issue from polygons that have vertices in common or polygons that abut along edges but do not overlap. GPC is used quite extensively and paid for by many people, we should be responsible about making claims about its short comings. Regards, Luke

Angus Johnson

9 Sep 9 Sep

12:13 a.m.

New subject: [Polygon] Test among 3 libraries

...

That's a worthwhile goal to be sure. When I discussed scalability I meant algorithmic complexity and ability to process large data volumes in input geometry in a single test case. Ahh, thanks for the clarification. I haven't yet done any significant testing of processing very large data volumes, nor am I competent to determine an O value, so I'll defer to your estimation there.

...

Also, I believe GPC does handle high order intersections because I'm quite certain I was generating large numbers of them in the tests I was running that GPC seemed to handle just fine. I could be wrong of course, but I suggest you double check that GPC doesn't properly deal with complex intersection. I know it doesn't deal with overlapping or self intersecting polygons at all, but that is a separate issue from polygons that have vertices in common or polygons that abut along edges but do not overlap. GPC is used quite extensively and paid for by many people, we should be responsible about making claims about its short comings.

I was careful to qualify my statements about GPC with "I don't think" and "I believe". However, I accept that this still isn't fair treatment and I should be sure before making these claims. Let me simply state that under my stess-testing, GPC fails about 4% of the time (and with extreme stress testing - rounding random vertices to nearest 100 instead of nearst 10 - it fails far more frequently). The full source code for my stress-tests can be downloaded here: http://www.angusj.com/delphi/clipper.php

Simonson, Lucanus J

5:35 p.m.

New subject: [Polygon] Test among 3 libraries

Angus Johnson wrote:

...

...
That's a worthwhile goal to be sure. When I discussed scalability I meant algorithmic complexity and ability to process large data volumes in input geometry in a single test case. Ahh, thanks for the clarification. I haven't yet done any significant testing of processing very large data volumes, nor am I competent to determine an O value, so I'll defer to your estimation there.

What you have already done is significantly harder than determining O values. Don't sell yourself short. If you plot number of vertices in input plus output on the x axis and runtime on the y axis in a excel spreadsheet you can have excel create an exponential trend line for the data and display the equation on the graph. The equation will look like 2.79235 * x^1.34984. Your empirical scaling factor (empirical O value) is then 1.35. In this hypothetical example 1.35 is faster than O(n * sqrt n) but slower than O(n log n). You should be able to estimate your empirical scaling factor very quickly by generating progressively large random test cases with this method. Read the experiment section of my boostcon paper http://www.boost.org/doc/libs/1_44_0/libs/polygon/doc/GTL_boostcon2009.pdf to get an idea how to structure the experiment and what the graph should look like (page 9.) I can share the code for the test harness with you if you like. The tricky part is making sure that the geometry doesn't qualitatively change as you make successively larger test cases because this can introduce a systematic bias in you empirical scaling factor. I'd be interested in seeing this kind of performance analysis on your clipper implementation. If you are using a linked list as your sweepline data structure your worst case runtime complexity is O(n^2) wrt. input plus intersection vertices and your expected runtime complexity is O(n^1.5) in the common case. In the worst case there is O(n) line segments in your sweepline at each sweepline "stop" (unique x coordinate) while in the common case there is O(n^0.5) line segments in your sweepline at each sweepline stop and O(n) stops.

...

...
Also, I believe GPC does handle high order intersections because I'm quite certain I was generating large numbers of them in the tests I was running that GPC seemed to handle just fine. I could be wrong of course, but I suggest you double check that GPC doesn't properly deal with complex intersection. I know it doesn't deal with overlapping or self intersecting polygons at all, but that is a separate issue from polygons that have vertices in common or polygons that abut along edges but do not overlap. GPC is used quite extensively and paid for by many people, we should be responsible about making claims about its short comings.

I was careful to qualify my statements about GPC with "I don't think" and "I believe". However, I accept that this still isn't fair treatment and I should be sure before making these claims. Let me simply state that under my stress-testing, GPC fails about 4% of the time (and with extreme stress testing - rounding random vertices to nearest 100 instead of nearest 10 - it fails far more frequently). The full source code for my stress-tests can be downloaded here: http://www.angusj.com/delphi/clipper.php

What is GPC's failure mode that you observe when it fails your stress test? Does it crash or does it just produce garbage output polygons. In my testing I found that given self intersecting or overlapping polygons in an input GPC produced garbage output. I would prefer a crash since it is as much work to check the result of a boolean is correct as to perform the boolean. An algorithm that produces the wrong result and reports success is worse than an algorithm that crashes. If you are substantially more robust than GPC they you are substantially robust. Epsilon methods won't get you to 100%, but there are other methods that you can use that get you all the way. You are asymptotically approaching them with your iterative epsilon idea. The really tough problem wrt. robustness is that numerically error move line segments laterally and can introduce spurious intersections in the output. These intersections become fatal errors for down stream processing of the output of your algorithm. There is a paper on floating point snap rounding by Milenkovic that could help you with 100% numerical robustness that avoids spurious intersections. Search for Polygon Set Operation and or Milenkovic and use cite seer to walk the reference links. I can send you my archive of papers if you don't have access to an academic database. I greatly approve of your stress testing approach. Try turning off rounding of the random vertices and then we'll really see where epsilon methods fall short. It isn't really fair to round the inputs with your knowledge of which epsilon you use and compare to someone elses algorithm. The kind stress testing you have been doing is exactly the right way to stress test numerical robustness. I wish others would do the same. Perhaps you can include the Boost.Geometry clipping implementation in your scaling and stress testing. If your implementation is both faster and more reliable than what they have I would think we would use yours to replace theirs in their library. I don't think users of the library would find that confusing. Also boost geometry is something of a test bed for development of better algorithms in addition to a vehicle for releasing them. From that standpoint multiple algorithms that do the same thing are part in parcel of developing better algorithms. Regards, Luke

Angus Johnson

5:58 p.m.

New subject: [Polygon] Test among 3 libraries

...

If you plot number of vertices in input plus output on the x axis and runtime on the y axis in a excel spreadsheet you can have excel create an exponential trend line for the data and display the equation on the graph.

It's 3.50am here in Sydney so I'll be very brief. Yesterday afternoon I did just that and I get a trendline with an equation of y = 3E-08x2.8413 which suggests scaling will be a problem.

...

What is GPC's failure mode that you observe when it fails your stress test? Does it crash or does it just produce garbage output polygons.

I didn't test for garbage output, just crashes. By the way, GPC does handle self-intersecting polygons as expected since it is based on Vatti's algorithm. Now, back to bed :).

Angus Johnson

6:51 p.m.

New subject: [Polygon] Test among 3 libraries

...

Yesterday afternoon I did just that and I get a trendline with an equation of y = 3E-08x2.8413 which suggests scaling will be a problem.

Can't sleep :(. Just realised I was using the wrong values in my chart. The number of edges was actually the square of the values I put into the chart. I'd designed my test polygon by creating a square filled with horizontal squiggly lines designed to maximally stress the vatti algorithm (since it results in a near maximal number of LocalMinima with left and right bounds all of length 1). Anyhow, reworking the data with the correct number of edges, the equation for this (near worst case) test polygon returns an equation of y = 1E-08x^1.4821 . If I generate the squiggly lines vertically in my test polygon (and destress the vatti algorithm) which is likely to be closer to normal use, I suspect this equation will drop further. Will leave that till the morning.

Angus Johnson

8:58 p.m.

New subject: [Polygon] Test among 3 libraries

...

Anyhow, reworking the data with the correct number of edges, the equation for this (near worst case) test polygon returns an equation of y = 1E-08x^1.4821 . If I generate the squiggly lines vertically in my test polygon (and destress the vatti algorithm) which is likely to be closer to normal use, I suspect this equation will drop further.

OK, when I rotated the test polygon 90 deg the trendline equation becomes : y = 2E-07x^1.0841 I would suggest that in normal use, my clipping library would perform somewhere between O(n^1.0841) and O(n^1.4821) though I suspect nearer to the lower value. Here's the function that produces my test polygons (generating what I believe would be near worst case vertices for the Vatti algorithm): //struct TDoublePoint { double X; double Y; }; //typedef std::vector< TDoublePoint > TPolygon; TPolygon MakeTestPoly(int startX, int startY, int boxSize){ int k,x,y; TPolygon result; result.resize(boxSize * boxSize +2); x = startX+4; y = startY; k = 0; for (int i = 1; i <= boxSize; ++i) { for (int j = 1; j <= boxSize; ++j) { x = x +(i % 2)*4 -2; y = y + (j % 2)*2 -1; result[k].X = x; result[k].Y = y; k++; } y = y + 4; x = x +(i % 2)*4 -2; } result[k].X = startX; result[k].Y = y -4; result[k+1].X = startX; result[k+1].Y = startY; return result; }

Simonson, Lucanus J

9:30 p.m.

New subject: [Polygon] Test among 3 libraries

Angus Johnson wrote:

...

...
Anyhow, reworking the data with the correct number of edges, the equation for this (near worst case) test polygon returns an equation of y = 1E-08x^1.4821 . If I generate the squiggly lines vertically in my test polygon (and destress the vatti algorithm) which is likely to be closer to normal use, I suspect this equation will drop further.

OK, when I rotated the test polygon 90 deg the trendline equation becomes : y = 2E-07x^1.0841 I would suggest that in normal use, my clipping library would perform somewhere between O(n^1.0841) and O(n^1.4821) though I suspect nearer to the lower value.

Here's the function that produces my test polygons (generating what I believe would be near worst case vertices for the Vatti algorithm):

//struct TDoublePoint { double X; double Y; }; //typedef std::vector< TDoublePoint > TPolygon;

TPolygon MakeTestPoly(int startX, int startY, int boxSize){ int k,x,y;

TPolygon result; result.resize(boxSize * boxSize +2); x = startX+4; y = startY; k = 0; for (int i = 1; i <= boxSize; ++i) { for (int j = 1; j <= boxSize; ++j) { x = x +(i % 2)*4 -2; y = y + (j % 2)*2 -1; result[k].X = x; result[k].Y = y; k++; } y = y + 4; x = x +(i % 2)*4 -2; } result[k].X = startX; result[k].Y = y -4; result[k+1].X = startX; result[k+1].Y = startY; return result; }

Just a C++ style pointer, we have what we call resource acquisition is initialization (RAII) policy where we initialize all variables at the time they are declared rather than deferring that to later. The rationale for this is that it avoids bugs introduced later when a variable is attempted to be used between where it was declared and where it is initialized resulting in undefined behavior and different behavior on different platforms/compiler optimization flags. RAII becomes much more important when memory allocation, locks, file handles, sockets and other tricky to handle resources are thrown into the mix because it allows you to encapsulate the acquisition and reliquishment of those resources in the declaration and destructor call for your objects which is much less error prone than the alternative. TPolygon MakeTestPoly(int startX, int startY, int boxSize){ int k = 0; int x = startX+4 int y = startY; TPolygon result; //it would be better if the TPolygon constructor accepted the size result.resize(boxSize * boxSize +2); //making this line uneeded //and to help prevent people from using the TPolygon but forgetting to resize it first. for (int i = 1; i <= boxSize; ++i) { for (int j = 1; j <= boxSize; ++j) { x = x +(i % 2)*4 -2; y = y + (j % 2)*2 -1; //squiggles right twices as fast as it squiggles up result[k].X = x; result[k].Y = y; k++; } y = y + 4; x = x +(i % 2)*4 -2; } result[k].X = startX; result[k].Y = y -4; result[k+1].X = startX; result[k+1].Y = startY; return result; } This case is actually a common case and not a worst case for your algoirthm because the average number of line segments intersecting your sweepline is more like O(n^1.5) because the line segments are very short on average. If you instead make n horizontal rectangles with random start and end coordinates and a small height within separate ranges of X for the start and end you will get O(n) unique x coordinates and on average O(n) (roughly 1/2 n) line segments intersecting your sweepline data structure at each unique x. This will be the worst case input and have expected runtime complexity of O(n^2) using a linked list as the sweepline data structure while an optimal Vatti using a tree would have O(n log n) complexity and empirical scale factor of about O(n^1.05). Regards, Luke

Angus Johnson

10 Sep 10 Sep

12:42 a.m.

New subject: [Polygon] Test among 3 libraries

...

Just a C++ style pointer, we have what we call resource acquisition is initialization (RAII) policy where we initialize all variables at the time they are declared rather than deferring that to later. The rationale for this is that it avoids bugs introduced later when a variable is attempted to be used between where it was declared and where it is initialized resulting in undefined behavior and different behavior on different platforms/compiler optimization flags.

Thanks. I'll try and remember that. Delphi enforces local variable declarations before any logical operations can be performed (IIRC to allow a one pass compile), so I'm going to have to break a long standing habit.

...

TPolygon result; //it would be better if the TPolygon constructor accepted the size

As you can see I'm still coming to grips with the fundamentals of C++. I forgot that the std vector's constructor accepted a size parameter.

...

y = y + (j % 2)*2 -1; //squiggles right twices as fast as it squiggles up

That was so I could see it was really doing what I expected, otherwise it just looked like a fat blurry line :).

...

This will be the worst case input and have expected runtime complexity of O(n^2) using a linked list as the sweepline data structure while an optimal Vatti using a tree would have O(n log n) complexity and empirical scale factor of about O(n^1.05).

I've given my data structure a little more thought and believe it isn't a simple link-list structure as I stated earlier. The main linked list, the active edge list (AEL) contains a list of active edges which also link to their adjacent edges in an edge bound (LML). I imagine that this is the tree structure you been mentioning.

Simonson, Lucanus J

5:32 a.m.

New subject: [Polygon] Test among 3 libraries

Angus Johnson wrote:

...

...
Just a C++ style pointer, we have what we call resource acquisition is initialization (RAII) policy where we initialize all variables at the time they are declared rather than deferring that to later. The rationale for this is that it avoids bugs introduced later when a variable is attempted to be used between where it was declared and where it is initialized resulting in undefined behavior and different behavior on different platforms/compiler optimization flags.

Thanks. I'll try and remember that. Delphi enforces local variable declarations before any logical operations can be performed (IIRC to allow a one pass compile), so I'm going to have to break a long standing habit.

...
TPolygon result; //it would be better if the TPolygon constructor accepted the size

As you can see I'm still coming to grips with the fundamentals of C++. I forgot that the std vector's constructor accepted a size parameter.

...
y = y + (j % 2)*2 -1; //squiggles right twices as fast as it squiggles up

That was so I could see it was really doing what I expected, otherwise it just looked like a fat blurry line :).

...
This will be the worst case input and have expected runtime complexity of O(n^2) using a linked list as the sweepline data structure while an optimal Vatti using a tree would have O(n log n) complexity and empirical scale factor of about O(n^1.05).

I've given my data structure a little more thought and believe it isn't a simple link-list structure as I stated earlier. The main linked list, the active edge list (AEL) contains a list of active edges which also link to their adjacent edges in an edge bound (LML). I imagine that this is the tree structure you been mentioning.

It doesn't sound like a tree. I don't quite grasp what you mean by adjacent edges in an edge bound. The empirical scaling factor of an optimal sweepline that uses a tree should be about n^1.05 for all test cases. That's what we see with CGAL, for example. I use the std::map (internally a red-black tree) as the sweep line data structure to maintain a sorted sequence of line segments that intersect the sweepline. I can lookup, insert and remove line segments from the map in O(log n) (amortized) runtime. Each line segment end point scanned during the progress of the sweepline gives rise to a constant factor number of lookup, insertion and removals from the sweepline. There are no more than O(n) elements in the map at any given time and there are O(n) line segment end points to process. Intersection events add to the number of events to be processed but not the max number of elements in the map. Therefore the runtime complexity is O((n+s) log n) for n line segments and s intersection points (Vatti's reported runtime complexity.) I usually simplfy to O(n log n) wrt line segments plus intersections because the number of intersections is at worst n^2 and log(n^2) = 2 log(n) so differs by only a constant factor. If the sweepline data structure requires linear lookup then you have O((n + s)n) runtime complexity. If the number of unique x values is smaller than O(n) then you could get a significant optimization with the list data structure by keeping your place in the sweepline data structure from the previous y value and a runtime complexity of O((n + s)x). There is another, more general, data dependent optimization where instead of line segments it is x-monotone chains of line segments that are processed by the sweepline, giving O(n + (x+s)log x) runtime complexity where x is the number of x montone chains. In the worst case x is n, but if you have nice smooth arcs in your data formed of large numbers of small line segments the speedup is dramatic. Implementing the x-monotone chain sweepline is similar in principle to implementing the sweepline over curves. There is a pretty significant difference between making the sub-optimal sweepline algorithm numerically robust and the optimal tree based sweepline algorithm numerically robust, so you might want to step back and decide whether you are looking to improve the robustness of your current algorithm or to improve the algorithm then come back to robustness. I have been mentoring a successful Google Summer of Code project this year that implemented an optimal and robust sweepline algorithm for the vornoi diagram of points and soon line segments. The robust and optimal sweepline for line segment intersection and polygon clipping is also achievable, but very tricky. Your work shows that you know how to test your code well, which counts for a lot. I'd like to encourage you to persevere in pursuing robustness as well as algorithmic optimality. Regards, Luke

Angus Johnson

11:04 p.m.

New subject: [Polygon] Test among 3 libraries

On 10/09/2010 3:32 PM, Simonson, Lucanus J wrote:

...

It doesn't sound like a tree. I don't quite grasp what you mean by adjacent edges in an edge bound.

...

The empirical scaling factor of an optimal sweepline that uses a tree should be about n^1.05 for all test cases.

I agree the data structure that I've used is not a typical tree, but it's not a simple linked list either. It is however pretty much exactly the data structure that's described by Vatti in his original paper. As per his algorithm - polygons are divided into series of left and right bounds (connected edges that form either a left or right border or 'bound' of a polygon). There may be any number of left and right bounds for any polygon (unless it's convex in which case there will only be one of each). Anyhow, what I meant by 'adjacent edges' was the next (or previous) edge in a linked list that makes a given bound. Concerning issues of scaling, I believe the sweep line approach that I've followed (as outlined by Vatti) is near maximally efficient. A left and a right bound will meet at a 'local minima', so there's also a local mimima structure which contains pointers to its associated left and right bounds and the Y coordinate of the local minima. These local minima's are sorted (on their Y values) into another linked list (LML). With this structure there's almost no 'lookup' required as the sweep progresses, except to check at the start of each sweep whether the Y coordinate of the first local minima in LML needs to be popped from LML while adding its 2 left and right bound edges to the 'active edge list' (AEL). Of course it's not really possible to condense Vatti's data structure into a few sentences, let alone the implemtation logic, by I understand you already have a basic understanding at least of this.

...

There is a pretty significant difference between making the sub-optimal sweepline algorithm numerically robust and the optimal tree based sweepline algorithm numerically robust, so you might want to step back and decide whether you are looking to improve the robustness of your current algorithm or to improve the algorithm then come back to robustness.

I believe my current algorithm (which very closely adhers to Vatti's data structure) is close to maximally efficient. The data structure at least doesn't need touching but no doubt the implementation can always be improved.

...

I have been mentoring a successful Google Summer of Code project this year that implemented an optimal and robust sweepline algorithm for the vornoi diagram of points and soon line segments. The robust and optimal sweepline for line segment intersection and polygon clipping is also achievable, but very tricky. Your work shows that you know how to test your code well, which counts for a lot. I'd like to encourage you to persevere in pursuing robustness as well as algorithmic optimality.

I'd certainly welcome your continuing feedback/suggestions/help along these lines. Angus

Simonson, Lucanus J

11:45 p.m.

New subject: [Polygon] Test among 3 libraries

Angus Johnson wrote:

...

...
There is a pretty significant difference between making the sub-optimal sweepline algorithm numerically robust and the optimal tree based sweepline algorithm numerically robust, so you might want to step back and decide whether you are looking to improve the robustness of your current algorithm or to improve the algorithm then come back to robustness.

I believe my current algorithm (which very closely adhers to Vatti's data structure) is close to maximally efficient. The data structure at least doesn't need touching but no doubt the implementation can always be improved.

Your empirical measurement of y = 1E-08x^1.4821 scaling is almost the exact scaling factor of 1.5 I would expect a sub-optimal sweepline algorithm to exhibit for the test case you described. Like I said, O(n log n) corresponds to scaling factor of less than 1.1. 1.48 is equivalent to an algorithmic complexity of O(n * sqrt(n)). 1.5 is a lot slower than 1.05 once you get over one million (like 500X slower) so this is a big issue. No matter how efficient your implementation it can't compete with an more optimal algorithm on larger tests cases. The data structure used for delivering the sorted sequence of edges to the sweepline is less important than the data structre used for the active edge list (which should be a tree and not a list). Please read the references of my boostcon paper http://www.boost.org/doc/libs/1_44_0/libs/polygon/doc/GTL_boostcon2009.pdf, in particular Ruud, B. Building a Mutable Set, 2003. Retrieved March 3, 2009, from Dr. Dobb's Portal: http://www.ddj.com/cpp/184401664 gives a clear description of how to use the std set or map to implement the sweepline data structure. Regards, Luke

Angus Johnson

12 Sep 12 Sep

2:32 a.m.

New subject: [Polygon] Test among 3 libraries

...

1.5 is a lot slower than 1.05 once you get over one million (like 500X slower) so this is a big issue.

I've created an small application* that compares performance (and scaling) of Clipper and GPC. I haven't gotten around to looking at GTL or other libraries yet. At the bottom of this email there's a snapshot of what I get with my Sony Laptop (2.53 gigahertz Intel Core2 Duo P8700 processor with 4GB RAM) running Window 7. I think this shows that Clipper still performs extremely well up to a million edges. Nevertheless, I'm not denying that clipping libraries with better scaling factors will eventually outperform. Also, from my testing, triangles (and complex self-intersecting polygons) perform the worst, followed by rectangles. Ellipses perform the best as I would expect with libraries based on the Vatti algorithm due to the low number of 'local minima' (ie 1) per polygon. *application (with C++Builder source) here: http://www.angusj.com/temp/scaling_test.zip

...

Please read the references of my boostcon paper http://www.boost.org/doc/libs/1_44_0/libs/polygon/doc/GTL_boostcon2009.pdf, in particular Ruud, B. Building a Mutable Set, 2003. Retrieved March 3, 2009, from Dr. Dobb's Portal: http://www.ddj.com/cpp/184401664 gives a clear description of how to use the std set or map to implement the sweepline data structure.

Thanks for the links. Both interesting reads. A quick read has already given me a couple of ideas for improvements (though I think they will only be marginal). =================================== Clipper Library: =================================== Clipping triangles: Total edge count Time (incl output edges) (seconds) 76663 0.22676 187318 0.706356 306353 1.485881 583294 4.023969 797229 7.672673 1024124 11.507399 trendline equation: y = 7E-09x^1.5267 Clipping ellipses: Total edge count Time (incl output edges) (seconds) 173609 1.428628 361264 3.58896 597069 6.470583 1286326 16.237705 trendline equation: y = 7E-07x^1.2111 =================================== GPC Library: =================================== Clipping triangles: Total edge count Time (incl output edges) (seconds) 52011 1.009507 67157 1.657865 102433 7.958484 119676 13.501837 184910 42.55886 trendline equation: y = 3E-15x^3.0774 Clipping ellipses: Total edge count Time (incl output edges) (seconds) 59861 0.849011 81796 1.572399 88384 1.8164 115281 3.940039 151506 12.479672 trendline equation: y = 1E-14x^2.8838

Angus Johnson

13 Sep 13 Sep

6:42 a.m.

New subject: [Polygon] Test among 3 libraries

...

Thanks for the links. Both interesting reads. A quick read has already given me a couple of ideas for improvements (though I think they will only be marginal).

I've now had a much better look at the Bentley-Ottmann algorithm and can see now why it could make a significant performance improvement. I'm also wondering if the event queue may also be manipulated to better manage complex intersection management too. Anyhow, all this would require a very major rewrite, so maybe it'll be something to look at in a few months time.

Jonathan Klein

9 Sep 9 Sep

8:17 p.m.

New subject: Serilization: Pointers to primitives

Hi, i'm tring to serialize a std::map<string, int*> but get strange compiler errors. Also i tried to serialize a **class, which also failed. Is it generally impossible to serialize a pointer to a primitvie typ (such as an int or another pointer) or am i just doint something wrong? Greetings Jona

Robert Ramey

9:55 p.m.

New subject: Serilization: Pointers to primitives

Jonathan Klein wrote:

...

Hi, i'm tring to serialize a std::map<string, int*> but get strange compiler errors.

This comes up all the time. You can't serialize a pointer to a prmiitive. Wrap the primitive in a class and serialize that. Also see BOOST_SERIALIZATION_STRONG_TYPEDEF

...

Also i tried to serialize a **class, which also failed.

This is not supported. There is an enhancement request to do so though. Robert Ramey

Angus Johnson

10:01 p.m.

New subject: [Polygon] Test among 3 libraries

...

I greatly approve of your stress testing approach. Try turning off rounding of the random vertices and then we'll really see where epsilon methods fall short.

I did that and both GPC and Clipper perform without problems because higher order (complex) intersections almost never occur. However, when I round the random vertices to 10 (in a bounding area of 500x700), and intersect two 100 edge polygons, the frequency of higher order intersections rises to an average of 5. When I round the random vertices to 100, the frequency of higher order intersections is close to 1000 (and GPC fails 95% of the time). However, what really maximally stresses both libraries is rounding random points as I've done to 10 (or 100!) and then minimally rotating them. This generates very near but not exactly parallel, overlapping edges at higher order (complex) intersections. Both Clipper and GPC can struggle under these conditions, though Clipper does a little better. This is what I'm trying to improve at the moment.

Angus Johnson

8 Sep 8 Sep

11:26 p.m.

New subject: [Polygon] Test among 3 libraries

...

I can review your code if you like.

Luke, just a very quick follow-up to my last email. The comments in the C++ code is much briefer than that in the Delphi code so you'll probably find it helpful to look at both together.

5416

Age (days ago)

5421

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Angus Johnson
Jonathan Klein
Robert Ramey
Simonson, Lucanus J