Re: [boost] [gsoc19] Seeking a mentor for Boost.Geometry project related to triangulation and random sampling

19 Mar 2019

      Tinko Bartels Via Boost wrote:
...
I completely overlooked that one. I see that it is designed to optionally
supports higher precision coordinate types. I will look further into that.
The whole library supports user-defined calculation type.
...
...
in-circle-predicate however is not. It also looks like more complex
operation since you have to find a center of the circle and then
check if a point is inside it. In Boost.Geometry this would be
divided into centroid() and distance(). So my guess is that the in-
circle-predicate should be divided into more basic building blocks.
However I'm not sure if the center of the circle is the same as
barycenter in non-cartesian coordinate systems. Furthermore
Boost.Geometry doesn't have non-cartesian centroid yet. Even if you
do not support geographic triangulation right now we should do
theoretical work and prepare the interface for future extensions,
i.e. know what basic coordinate-system-dependant operations will be
needed in the future.
I implemented it in that two-stage way in my implementation of Delaunay
Triangulation for the programming competency test. However, I found that
this seems to be usually done in a single step (see
https://www.cs.cmu.edu/~quake/robust.html ), I guess for performance and
stability reasons. I noticed that with larger problem sizes (1'000'000
points and more) my Delaunay triangulation would not terminate for some runs
(I generate the problems randomly) and some experiments (I did not look too
deeply into it) suggest that my two-staged, non-robust in-circle-predicate
is to blame here.
I would trust the literature there and implement a simple approximate
predicate that works for non-degenerate problems and an adaptive predicate
that will be a little slower but still work for degenerate cases.
Vissarion pointed out in other email that I made a mistake with centroid 
being the center of the circle.

Sorry about that. The two steps would be envelope(triangle, 
sphere_on_surface) and covered_by(point, sphere_on_surface). I wrote 
sphere_on_surface because we also have nsphere (n-Dimensional sphere) in 
the extensions and AFAIU this would be something different. In 
particular in geographic the shape of the sphere on the surface of the 
ellipsoid would be different and we wouldn't be able to represent it as 
a center point and radius.

He also seems to agree with you that this should be done in one step. I 
have nothing against it if it's proven to be faster. My main concern was 
bviously the reusability of the code and the complexity of interface. 
1-stage operation would require new type of strategy but that's ok.
...
...
Note also that performing randomization for arbitrary geometries
would be supported with this interface, also e.g. N random points
within multipoint.
I agree but the probability of a uniformly distributed point lying inside a
multipoint (except for a very coarse space) is vanishingly small. I think
that point generation should at least be specialized by the type of domain
(discrete, lines, volumes).
If the interface allowed it and predicates were passed they would define 
in which topological parts the randomized points would be. So if one 
wanted to get points within multipoint like this:

generate(N, within(multipoint), output)

the algorithm would return N points from a multipoint (in the interior 
of multipoint, see: https://en.wikipedia.org/wiki/DE-9IM). Note that it 
doesn't mean it would have to be rejection sampler because we can 
imagine a case when the predicate expression is analysed, new geometry 
is generated from it and then sampling is done in this geometry, e.g.:

generate(N, within(polyA) && ! within(polyB), output)

could first calculate

difference(polyA, polyB, polyC)

and then generate points that are in polyC (with special handling of 
boundaries because the boundary of polyB would be allowed).

Not all predicates could be allowed, we could limit which kinds of 
geometries and predicates can be passed together, etc.
...
...
It is also possible to decide to go this way but do not implement all
elements right now. In such case we could think about the interface
and be prepared for extensions in the future. E.g. for now support
only something like this:
namespace bg = boost::geometry;
bg::rng::generate(bg::rng::within(A), output);
I like that interface. I would maybe drop the rng, because it's conceivable
to have generators in the future for deterministic point generation as well,
for example to create some regular lattice or nodes on edges.
The problem is the name "within" which already exists in boost::geometry 
namespace as relational operation and also in boost::geometry::index 
namespace as rtree spatial predicate. So we'd have to find a different 
name or put it in a different namespace.

One more general comment at the end related to all projects and 
non-cartesian coordinate systems. In general the approach in the library 
we perform calculations directly in the coordinate systems of the data 
or strategy. So we do not transform between them in the process. If the 
user wants to do transform e.g. geographic to cartesian, perform an 
operation and then transform back he can do it. Space reference systems 
transformations are implemented but not documented. See:

https://github.com/boostorg/geometry/tree/develop/include/boost/geometry/srs
https://github.com/boostorg/geometry/tree/develop/test/srs

The problem with transformations is that they are contextual, i.e. there 
is no one transformation with standard set of parameters which would 
work for all possible geographic data. It depends on the use-case and 
should be choosen by the user. AFAIR you mentioned earlier projecting 
from sphere to cylinder. The greater the polygon/triangle the more 
distorted the coordinates would be. This would also be true for the 
distribution of the random points, it'd no longer be uniform. So we'd 
have to divide the problem into smaller triangles, perform separate 
projections for each of them, calculate in cartesian and the transform 
back the result. And its is still not clear what should be the accuracy 
of the operation i.e. size of the triangles and therefore their number 
(accuracy-performance tradeoff). My (not informed) guess is that this 
also depends on the use case so we end up at the user deciding about it 
anyway.

Adam

Re: [boost] [gsoc19] Seeking a mentor for Boost.Geometry project related to triangulation and random sampling

Adam Wulkiewicz