geometry: knn search of point to many linestrings?

newer
Vector values gradients for vector...

older
Re: [Boost-users] casually blocked...

Jens Thiele

18 May 2021 18 May '21

7:34 a.m.

Hi, I have lots of linestrings and want to find the k nearest linestrings to some point. Looking at the example /libs/geometry/doc/index/src/examples/rtree/polygons_shared_ptr.cpp I first thought this should be close to the solution and I just could replace the polygons with linestrings. But now I think the nearest query in that example only finds the nearest polygon bounding boxes and not the nearest polygons. Am I correct? If yes, how would one extend that example to find the nearest polygons? Greetings Jens

Show replies by date

Adam Wulkiewicz

23 May 23 May

3:33 p.m.

W dniu 18.05.2021 o 09:34, Jens Thiele via Boost-users pisze:

...

I have lots of linestrings and want to find the k nearest linestrings to some point.

Looking at the example /libs/geometry/doc/index/src/examples/rtree/polygons_shared_ptr.cpp I first thought this should be close to the solution and I just could replace the polygons with linestrings. But now I think the nearest query in that example only finds the nearest polygon bounding boxes and not the nearest polygons. Am I correct?

If yes, how would one extend that example to find the nearest polygons?

Hi Jens, Yes, your understanding is correct. Bounding boxes of polygons together with pointers to polygons are stored in the rtree. This is also what is returned by the query. So you need to calculate the distances to actual linestrings by yourself. I propose you to use query iterators instead of query function. Then you can iterate over nearest boxes (passing the number of values stored in the rtree into the nearest predicate). In the loop calculate distances to linestrings and break when you have enough of them. You should probably break when the number of linestrings you have is equal to your K and the distance to the furthest linestring is lesser than the distance to the current box returned by the rtree query (because then you know that you will not get any closer linestring). To track the furthest linestring you can use std::priority_queue. Adam

Adam Wulkiewicz

3:36 p.m.

W dniu 23.05.2021 o 17:33, Adam Wulkiewicz pisze:

...

W dniu 18.05.2021 o 09:34, Jens Thiele via Boost-users pisze:

...
I have lots of linestrings and want to find the k nearest linestrings to some point.

Looking at the example /libs/geometry/doc/index/src/examples/rtree/polygons_shared_ptr.cpp I first thought this should be close to the solution and I just could replace the polygons with linestrings. But now I think the nearest query in that example only finds the nearest polygon bounding boxes and not the nearest polygons. Am I correct?

If yes, how would one extend that example to find the nearest polygons?

Hi Jens,

Yes, your understanding is correct. Bounding boxes of polygons together with pointers to polygons are stored in the rtree. This is also what is returned by the query. So you need to calculate the distances to actual linestrings by yourself.

I propose you to use query iterators instead of query function. Then you can iterate over nearest boxes (passing the number of values stored in the rtree into the nearest predicate). In the loop calculate distances to linestrings and break when you have enough of them. You should probably break when the number of linestrings you have is equal to your K and the distance to the furthest linestring is lesser than the distance to the current box returned by the rtree query (because then you know that you will not get any closer linestring). To track the furthest linestring you can use std::priority_queue.

Adam

Correction: "the current box returned by the rtree query" I of course had in mind: "the current box returned by the query iterator"

Jens Thiele

14 Jun 14 Jun

9:33 a.m.

Adam Wulkiewicz via Boost-users <boost-users@lists.boost.org> writes:

...

W dniu 23.05.2021 o 17:33, Adam Wulkiewicz pisze:

...
W dniu 18.05.2021 o 09:34, Jens Thiele via Boost-users pisze:

...
I have lots of linestrings and want to find the k nearest linestrings to some point.

Looking at the example /libs/geometry/doc/index/src/examples/rtree/polygons_shared_ptr.cpp I first thought this should be close to the solution and I just could replace the polygons with linestrings. But now I think the nearest query in that example only finds the nearest polygon bounding boxes and not the nearest polygons. Am I correct?

If yes, how would one extend that example to find the nearest polygons?

Hi Jens,

Hi,

...

...
Yes, your understanding is correct. Bounding boxes of polygons together with pointers to polygons are stored in the rtree. This is also what is returned by the query. So you need to calculate the distances to actual linestrings by yourself.

I propose you to use query iterators instead of query function. Then you can iterate over nearest boxes (passing the number of values stored in the rtree into the nearest predicate). In the loop calculate distances to linestrings and break when you have enough of them. You should probably break when the number of linestrings you have is equal to your K and the distance to the furthest linestring is lesser than the distance to the current box returned by the rtree query (because then you know that you will not get any closer linestring). To track the furthest linestring you can use std::priority_queue.

Adam

Correction: "the current box returned by the rtree query"

I of course had in mind: "the current box returned by the query iterator"

I followed that route and the results look correct but performance is really bad. Nearly all time (0.5s with a really small test) is consumed by qbegin: rtree_t::const_query_iterator it = t->qbegin(idx::nearest(pt, tree_size)); Any ideas how to improve performance? Jens

Adam Wulkiewicz

9:54 a.m.

W dniu 14.06.2021 o 11:33, Jens Thiele via Boost-users pisze:

...

Adam Wulkiewicz via Boost-users <boost-users@lists.boost.org> writes:

...
W dniu 23.05.2021 o 17:33, Adam Wulkiewicz pisze:

...
W dniu 18.05.2021 o 09:34, Jens Thiele via Boost-users pisze:

...
I have lots of linestrings and want to find the k nearest linestrings to some point.

Looking at the example /libs/geometry/doc/index/src/examples/rtree/polygons_shared_ptr.cpp I first thought this should be close to the solution and I just could replace the polygons with linestrings. But now I think the nearest query in that example only finds the nearest polygon bounding boxes and not the nearest polygons. Am I correct?

If yes, how would one extend that example to find the nearest polygons? Hi Jens, Hi,

...
...
Yes, your understanding is correct. Bounding boxes of polygons together with pointers to polygons are stored in the rtree. This is also what is returned by the query. So you need to calculate the distances to actual linestrings by yourself.

I propose you to use query iterators instead of query function. Then you can iterate over nearest boxes (passing the number of values stored in the rtree into the nearest predicate). In the loop calculate distances to linestrings and break when you have enough of them. You should probably break when the number of linestrings you have is equal to your K and the distance to the furthest linestring is lesser than the distance to the current box returned by the rtree query (because then you know that you will not get any closer linestring). To track the furthest linestring you can use std::priority_queue.

Adam Correction: "the current box returned by the rtree query"

I of course had in mind: "the current box returned by the query iterator" I followed that route and the results look correct but performance is really bad. Nearly all time (0.5s with a really small test) is consumed by qbegin: rtree_t::const_query_iterator it = t->qbegin(idx::nearest(pt, tree_size));

Any ideas how to improve performance?

Are you compiling with optimizations enabled? Without knowing your code I can't esstimate if this is long time or not, I can only guess. In general qbegin() loop gathering some number of nearest elements should take similar time to query() call doing the same. So you can try getting some number of knn boxes from the R-tree and checking if this is the case. The speed of the R-tree is determined by the tree internal structure which is created during balancing or packing (packing is the fastest), minimum and maximum number of elements in nodes and the data itself (sparsity vs overlap). How is your R-tree created and what are the parameters? Some users confuse maximum number of parameters in node with capacity/size of the R-tree. The result is the R-tree with very big root node which is effectively a vector storing all of the elements. So the query has to check all of the elements of this node and by extension the whole R-tree during a query. Adam

Jens Thiele

10:26 a.m.

Adam Wulkiewicz via Boost-users <boost-users@lists.boost.org> writes:

...

W dniu 14.06.2021 o 11:33, Jens Thiele via Boost-users pisze:

...
Adam Wulkiewicz via Boost-users <boost-users@lists.boost.org> writes:

...
W dniu 23.05.2021 o 17:33, Adam Wulkiewicz pisze:

...
W dniu 18.05.2021 o 09:34, Jens Thiele via Boost-users pisze:

...
I have lots of linestrings and want to find the k nearest linestrings to some point.

Looking at the example /libs/geometry/doc/index/src/examples/rtree/polygons_shared_ptr.cpp I first thought this should be close to the solution and I just could replace the polygons with linestrings. But now I think the nearest query in that example only finds the nearest polygon bounding boxes and not the nearest polygons. Am I correct?

If yes, how would one extend that example to find the nearest polygons? Hi Jens, Hi,

...
...
Yes, your understanding is correct. Bounding boxes of polygons together with pointers to polygons are stored in the rtree. This is also what is returned by the query. So you need to calculate the distances to actual linestrings by yourself.

I propose you to use query iterators instead of query function. Then you can iterate over nearest boxes (passing the number of values stored in the rtree into the nearest predicate). In the loop calculate distances to linestrings and break when you have enough of them. You should probably break when the number of linestrings you have is equal to your K and the distance to the furthest linestring is lesser than the distance to the current box returned by the rtree query (because then you know that you will not get any closer linestring). To track the furthest linestring you can use std::priority_queue.

Adam Correction: "the current box returned by the rtree query"

I of course had in mind: "the current box returned by the query iterator" I followed that route and the results look correct but performance is really bad. Nearly all time (0.5s with a really small test) is consumed by qbegin: rtree_t::const_query_iterator it = t->qbegin(idx::nearest(pt, tree_size));

Any ideas how to improve performance?

Are you compiling with optimizations enabled?

yes, with -O2

...

Without knowing your code I can't esstimate if this is long time or not, I can only guess. In general qbegin() loop gathering some number of nearest elements should take similar time to query() call doing the same. So you can try getting some number of knn boxes from the R-tree and checking if this is the case.

it looks like I have some fundamental misunderstanding here: t->qbegin(idx::nearest(pt, 100)) is really fast (<1ms) but t->qbegin(idx::nearest(pt, tree_size)) with tree_size=37133 is really slow (~500ms) shouldn't those calls take approximately the same time? I thought the search is done incrementally?

...

The speed of the R-tree is determined by the tree internal structure which is created during balancing or packing (packing is the fastest), minimum and maximum number of elements in nodes and the data itself (sparsity vs overlap). How is your R-tree created and what are the parameters?

I am modifying existing code there and unfortunately don't have a minimalistic test to share. Let me see: namespace b = boost; namespace bg = b::geometry; namespace bgm = bg::model; namespace idx = bg::index; namespace ipc = b::interprocess; typedef bgm::point <double, 2, bg::cs::cartesian> point_t; typedef bgm::box<point_t> box_t; typedef std::pair<box_t, way_info_t> value_t; typedef idx::linear<51, 25> params_t; typedef idx::indexable<value_t> indexable_t; typedef idx::equal_to<value_t> equal_to_t; typedef ipc::allocator<value_t, ipc::managed_mapped_file::segment_manager> \ allocator_t; typedef idx::rtree<value_t, params_t, indexable_t, equal_to_t, allocator_t> rtree_t;

...

Some users confuse maximum number of parameters in node with capacity/size of the R-tree. The result is the R-tree with very big root node which is effectively a vector storing all of the elements. So the query has to check all of the elements of this node and by extension the whole R-tree during a query.

ok Jens

Adam Wulkiewicz

11:09 a.m.

W dniu 14.06.2021 o 12:26, Jens Thiele pisze:

...

Adam Wulkiewicz via Boost-users <boost-users@lists.boost.org> writes:

...
Without knowing your code I can't esstimate if this is long time or not, I can only guess. In general qbegin() loop gathering some number of nearest elements should take similar time to query() call doing the same. So you can try getting some number of knn boxes from the R-tree and checking if this is the case. it looks like I have some fundamental misunderstanding here: t->qbegin(idx::nearest(pt, 100)) is really fast (<1ms) but t->qbegin(idx::nearest(pt, tree_size)) with tree_size=37133 is really slow (~500ms)

shouldn't those calls take approximately the same time? I thought the search is done incrementally?

Yes, this doesn't look right. I have to look into this. Would you mind creating an issue at github? Adam

Jens Thiele

1:47 p.m.

Adam Wulkiewicz via Boost-users <boost-users@lists.boost.org> writes:

...

W dniu 14.06.2021 o 12:26, Jens Thiele pisze:

...
Adam Wulkiewicz via Boost-users <boost-users@lists.boost.org> writes:

...
Without knowing your code I can't esstimate if this is long time or not, I can only guess. In general qbegin() loop gathering some number of nearest elements should take similar time to query() call doing the same. So you can try getting some number of knn boxes from the R-tree and checking if this is the case. it looks like I have some fundamental misunderstanding here: t->qbegin(idx::nearest(pt, 100)) is really fast (<1ms) but t->qbegin(idx::nearest(pt, tree_size)) with tree_size=37133 is really slow (~500ms)

shouldn't those calls take approximately the same time? I thought the search is done incrementally?

Yes, this doesn't look right. I have to look into this. Would you mind creating an issue at github?

https://github.com/boostorg/geometry/issues/867 Jens

Jens Thiele

15 Jun 15 Jun

3:33 p.m.

Jens Thiele <karme@karme.de> writes:

...

https://github.com/boostorg/geometry/issues/867

added a test there to make it reproducible Jens

Jens Thiele

13 Jul 13 Jul

3:04 p.m.

Jens Thiele <karme@karme.de> writes:

...

Adam Wulkiewicz via Boost-users <boost-users@lists.boost.org> writes:

...
W dniu 23.05.2021 o 17:33, Adam Wulkiewicz pisze:

...
I propose you to use query iterators instead of query function. Then you can iterate over nearest boxes (passing the number of values stored in the rtree into the nearest predicate). In the loop calculate distances to linestrings and break when you have enough of them. You should probably break when the number of linestrings you have is equal to your K and the distance to the furthest linestring is lesser than the distance to the current box returned by the rtree query (because then you know that you will not get any closer linestring). To track the furthest linestring you can use std::priority_queue.

Adam

Correction: "the current box returned by the rtree query"

I of course had in mind: "the current box returned by the query iterator"

I followed that route and the results look correct but performance is really bad.

the performance problem is fixed now [1] :-) - thanks again! Another problem I face now is that each run produces different results which is still correct (multiple linestrings might have the same distance to some point) but I would prefer a reproducible result. I guess the cause for this is that rtree_t::const_query_iterator it=t->qbegin(idx::nearest(pt, tree_size)); returns an iterator where the order isn't stable. Would it be difficult to fix the iteration order? Jens [1] s.a. https://github.com/boostorg/geometry/issues/867

Adam Wulkiewicz

14 Jul 14 Jul

2:02 a.m.

...

Another problem I face now is that each run produces different results which is still correct (multiple linestrings might have the same distance to some point) but I would prefer a reproducible result.

I guess the cause for this is that rtree_t::const_query_iterator it=t->qbegin(idx::nearest(pt, tree_size)); returns an iterator where the order isn't stable. Would it be difficult to fix the iteration order? The rtree can have different internal structure depending on balancing algorithm and the order of inserts. So unless you know that values are always inserted in the same order it's impossible to guarantee that the structure is going to be the same. Then the order is also determined by

W dniu 13.07.2021 o 17:04, Jens Thiele via Boost-users pisze: the internals of the iterator, how exactly the rtree is traversed, how nodes and values are prioritized based on distance. And yes, currently heap and sort is used and I plan to replace sort with minmax heap for performance. Neither of these is stable. But I don't understand what do you mean by "each run" and "stable order"? What the stability refers to? Could you give an example? Adam

Jens Thiele

2:28 p.m.

Adam Wulkiewicz via Boost-users <boost-users@lists.boost.org> writes:

...

...
Another problem I face now is that each run produces different results which is still correct (multiple linestrings might have the same distance to some point) but I would prefer a reproducible result.

I guess the cause for this is that rtree_t::const_query_iterator it=t->qbegin(idx::nearest(pt, tree_size)); returns an iterator where the order isn't stable. Would it be difficult to fix the iteration order? The rtree can have different internal structure depending on balancing algorithm and the order of inserts. So unless you know that values are always inserted in the same order it's impossible to guarantee that

W dniu 13.07.2021 o 17:04, Jens Thiele via Boost-users pisze: the structure is going to be the same. Then the order is also determined by the internals of the iterator, how exactly the rtree is traversed, how nodes and values are prioritized based on distance. And yes, currently heap and sort is used and I plan to replace sort with minmax heap for performance. Neither of these is stable.

But I don't understand what do you mean by "each run" and "stable order"? What the stability refers to? Could you give an example?

Problem solved. The details: The test basically does the following: 1) create r-tree 2) knn search of some points to linestrings 3) repeat 2 and compare results Now a few points have the same distance to multiple linestrings and in those cases I saw some random ordering of the linestrings in the knn search results. But: it wasn't the query iterator that introduced that randomness it was the priority queue in the case where linestrings had the same distance => priority. To break the tie I now additionally use the ID of the linestring => no more random ordering. Jens

1447

Age (days ago)

1504

Last active (days ago)

List overview

Download

4 comments

2 participants

participants (2)

Adam Wulkiewicz
Jens Thiele