Re: [Boost-users] Using Coroutines in the Visitor Pattern (use-case: BGL event visitor)
-----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Daniel Hofmann Sent: 17 November 2015 15:55 To: boost-users@lists.boost.org Subject: [Boost-users] Using Coroutines in the Visitor Pattern (use-case: BGL event visitor)
(Disclaimer: this is not only specific to BGL, I can think of similar use-cases where the visitor pattern is used --- I will still explain it in the BGL context, as it is probably easier to understand)
Suppose you have a BGL graph and want to implement s-t shortest path queries. It is possible to do this by using a bi-directional approach, that is starting a Dijkstra search [1] from the start and starting a additional Dijkstra search from the target. As soon as they "meet in the middle", you can unpack the two sub-paths.
[snip]
It then came to my mind that I could use Coroutines for cooperative multitasking, without the downsides that come with threading.
My idea was to invert the callbacks using Coroutines as it is shown in the SAX parsing example [3], in order to get a lazy range of vertices each visitor examines. I then could simply walk the two ranges, terminating as soon as there is the same element in both.
Jeremiah Willcock gave a good description of how this could work: http://lists.boost.org/Archives/boost/2012/07/195300.php The way he describes it is really using coroutines instead of the visitor pattern.
But due to way the dijkstra_shortest_path function adds an additional level of indirection I failed to implement this, and I have strong doubts that it is possible with Coroutines at all.
I>
I'm mainly struggling with wrapping my head around stopping and resuming the dijkstra_shortest_path function, when all I can modify is the visitor it takes. That is, the reason I can not make it work is that I can not launch two dijkstra_shortest_path functions asynchronously with just Coroutines.
I think there are two options: 1. Let your visitor throw an exception to get out of the dijkstra shortest path function; use some version of the function that skips initialization to resume again (the existing dijkstra_shortest_path_no_init is not good enough, so you would have to make one). 2. Reimplement the dijkstra function as a coroutine. I have implemented both, but have only really used option 1; whereas option 2 remained a toy experiment. In both cases the tricky part was to properly initialize and keep track of(and not re-initialize on resume) all parameters that are input to the dijkstra function, i.e. color_map, queue, distance_map, etc. https://github.com/ahhz/resumable_dijkstra I used the lightweight stackless coroutine of Boost ASIO. http://www.boost.org/doc/libs/1_59_0/doc/html/boost_asio/reference/coroutine .html
The control flow I need would probably look like this: start the first Dijkstra search, on its first vertex it visits, "jump back out" and start the second Dijkstra visitor. From there on ping-pong between the two visitors exclusively. And I'm not sure this is possible with the Coroutine approach.
It depends how much you are willing to re-implement. It is possible in principle
I appreciate any tips in this regard, even if it's a "just use your 15 lines of threading code instead of trying to be fancy" :)
I think it would be great to have a fancy solution for this in BGL.
On Wed, Nov 18, 2015 at 7:09 AM, alex
Jeremiah Willcock gave a good description of how this could work:
http://lists.boost.org/Archives/boost/2012/07/195300.php
The way he describes it is really using coroutines instead of the visitor pattern.
My example in previous mail only forwards a single visitor event, as described in the original problem statement. If you wanted to pass additional dijkstra_shortest_path() visitor events through the inversion of control, you could use an approach similar to the recursive SAX parsing example. That's what I think you're citing from Jeremiah's linked mail.
I think there are two options: 1. Let your visitor throw an exception to get out of the dijkstra shortest path function; use some version of the function that skips initialization to resume again (the existing dijkstra_shortest_path_no_init is not good enough, so you would have to make one).
Possible, but yes, you would have to reimplement the algorithm to pursue this approach.
2. Reimplement the dijkstra function as a coroutine.
Fortunately this is not necessary, as shown in previous mail. You can simply call dijkstra_shortest_paths() in a coroutine.
I have implemented both, but have only really used option 1; whereas option 2 remained a toy experiment. In both cases the tricky part was to properly initialize and keep track of(and not re-initialize on resume) all parameters that are input to the dijkstra function, i.e. color_map, queue, distance_map, etc.
Using a real coroutine makes that problem go away entirely. To the dijkstra_shortest_paths() algorithm running in a coroutine, the operation of passing each examined vertex to the consuming program simply looks like a function call. All existing state is preserved. Each time it returns from that function call, it resumes as it would with any ordinary visitor.
I used the lightweight stackless coroutine of Boost ASIO.
Aha! That would indeed cause you the troubles you describe. This is the great benefit of using *stackful* coroutines a la Boost.Coroutine: you can run a recursive algorithm on one stack, completely independently of the control flow on another stack.
The control flow I need would probably look like this: start the first Dijkstra search, on its first vertex it visits, "jump back out" and start the second Dijkstra visitor. From there on ping-pong between the two visitors exclusively. And I'm not sure this is possible with the Coroutine approach.
It depends how much you are willing to re-implement. It is possible in principle
It is possible in practice, without any reimplementation at all, as shown in previous mail.
It depends how much you are willing to re-implement. It is possible in principle
It is possible in practice, without any reimplementation at all, as shown in previous mail.
Your solution is beautiful and a great demonstration of Boost Coroutine. For what it's worth, I had also used an earlier incarnation of Boost Coroutine (while it was being reviewed) as a stackful coroutine implementation, but wasn't so clever in the use of visitors and reimplemented/copied the algorithm with added yield statements. That solution was very inefficient and I discarded it. Boost Coroutine has changed (and surely improved) a lot since then. I would be curious to know the computational cost of the inversion of control using your method. Have you tried to quantify it?
On Thu, Jan 14, 2016 at 4:31 AM, alex
Boost Coroutine has changed (and surely improved) a lot since then. I would be curious to know the computational cost of the inversion of control using your method. Have you tried to quantify it?
That's an interesting question. Partly, of course, it depends on what you're comparing it to. Throwing exceptions from within a visitor, and then reconstructing state back down to the point at which the algorithm was last interrupted? Intuition suggests that the coroutine approach beats that handily. Likewise, I'm intuitively certain that using stackful coroutines is significantly cheaper than using threads with locks. The OS doesn't participate in coroutine context switching at all, and since the coroutines are simply passing control back and forth within the same thread, no lock objects are needed. I think you mean: what's the cost of the context switching? Oliver Kowalke (author of the Coroutine library) made this remark in private mail:
I would suggest that you use boost.coroutine2 (from branch develop + context from branch develop) - it is a little bit faster (38ns vs. 46ns) than boost.coroutine.
But of course that's on particular hardware. I don't know whether you find that answer satisfactory. If not, please clarify the question?
From: Nat Goodspeed On Thu, Jan 14, 2016 at 4:31 AM, alex
wrote: Boost Coroutine has changed (and surely improved) a lot since then. I would be curious to know the computational cost of the inversion of control using your method. Have you tried to quantify it?
That's an interesting question. Partly, of course, it depends on what you're comparing it to. Throwing exceptions from within a visitor, and then reconstructing state back down to the point at which the algorithm was last interrupted? Intuition suggests that the coroutine approach beats that handily.
I don't think reconstructing the state should be part of the comparison, because it is possible (with a lot of hassle) to avoid needing to do that.
Likewise, I'm intuitively certain that using stackful coroutines is significantly cheaper than using threads with locks. The OS doesn't participate in coroutine context switching at all, and since the coroutines are simply passing control back and forth within the same thread, no lock objects are needed.
Yes, I don't think this is the comparison to make.
I think you mean: what's the cost of the context switching? Oliver Kowalke (author of the Coroutine library) made this remark in private mail:
I would suggest that you use boost.coroutine2 (from branch develop + context from branch develop) - it is a little bit faster (38ns vs. 46ns) than boost.coroutine.
But of course that's on particular hardware. I don't know whether you find that answer satisfactory. If not, please clarify the question?
The more I think about it, the more I love your solution. I think there are three use-cases to consider: (1). The algorithm needs to be interrupted and resumed many times, e.g. every time a node is discovered as in your example. (2). The algorithm needs to be interrupted and resumed only a few times. This was my original use case. I would first run the Dijkstra algorithm until a specified set of nodes was discovered. Then resume it until a second set of nodes was discovered. (3). The algorithm needs to be interrupted once, and not resumed. I suppose this is the most common case; it is the first question in the FAQ (http://www.boost.org/doc/libs/1_60_0/libs/graph/doc/faq.html) and you can find it recurring on the boost mailing list and stack overflow. Why was my original coroutine solution so inefficient? My use-case was (2) so I only needed to leave and re-enter the coroutine a few times. However, the "inversion of control" meant that I left and reentered the coroutine every time a node was discovered, so really appropriate for use-case (1). Now, since you bind the coroutine to the visitor and not the algorithm, it would be very easy to modify your example to make it appropriate for use-case (2) and also (3): just put the mSink(u) inside an if-statement. It therefore seems to me that the FAQ #1 for BGL needs a new answer. Instead of throwing exceptions to escape the algorithm, visitors should bind to a push-coroutine. With the additional advantage that the algorithm can be resumed. The comparison to make to justify this recommendation would be between running the dijkstra_shortest_path in a try-catch block with a visitor that can throw and running the dijkstra_shortest_path with a visitor that is bound to a coroutine. This would cover use case (2) and (3). I am not even sure if you really need an experiment to answer this question or if it can be answered on theoretical grounds. Is there a cost associated with being bound to a coroutine? I.e. is there a cost to *potentially* switching context or is there only a cost to *actually* switching context? Then there is use case (1), which is the one you implemented. I would be curious to know how it would compare to my stackless coroutine based solution. But to be fair, your solution is small, elegant and external to BGL, whereas mine is large, overwrought and reimplements algorithms. So the onus would be on me to show that all this extra effort is actually worthwhile (which I now severely doubt). Nevertheless, I think it would be useful to compare running dijkstra_shortest_path as it currently is in BGL to your inverted control method. I.e. simple do: while(from_a()){}. This would tell us what cost is associated with inverting the control and make it easier to decide whether it is a pattern to use whenever it seems convenient or only when strictly necessary. (with my solution the cost of inversion is about 17% extra computation time)
Nevertheless, I think it would be useful to compare running dijkstra_shortest_path as it currently is in BGL to your inverted control method. I.e. simple do: while(from_a()){}.
This would tell us what cost is associated with inverting the control and make it easier to decide whether it is a pattern to use whenever it seems convenient or only when strictly necessary. (with my solution the cost of inversion is about 17% extra computation time)
I just tried it on a graph with 3 million nodes. With your solution and the additional cost is 11%, so I would say it is superior on all accounts. I just wish I knew this earlier. reading bimba_6M.obj... converting to boost graph type ... plain boost dijkstra_shortest_path ...1712 ms stackless coroutine algorithm (Alex)...2065 ms boost coroutine visitor (Nat Goodspeed)...1890 ms
2016-01-15 18:46 GMT+01:00 alex
I just tried it on a graph with 3 million nodes. With your solution and the additional cost is 11%, so I would say it is superior on all accounts. I just wish I knew this earlier.
boost coroutine visitor (Nat Goodspeed)...1890 ms
Could test it again with boost.coroutine2 + boost.context from branch develop and post the result?
From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Oliver Kowalke Sent: 15 January 2016 18:55 2016-01-15 18:46 GMT+01:00 alex
: I just tried it on a graph with 3 million nodes. With your solution and the additional cost is 11%, so I would say it is superior on all accounts. I just wish I knew this earlier.
boost coroutine visitor (Nat Goodspeed)...1890 ms
Could test it again with boost.coroutine2 + boost.context from branch develop and post the result?
Sorry that is too much work for me now. But I am happy to share the program I used with you. To get Nat's method to work for me, I used the following: while (dijkstra_object) { dijkstra_object(); } Where dijkstra_object is a boost::coroutines::asymmetric_coroutine<Vertex>::pull_type. It was somehow disappointing that I couldn't simply do: while (dijkstra_object() ) { } I can see in the documentation that the constructor enters the coroutine-function, but it is not clear to me why. Would it not have been neater if this was avoided?
On Mon, Jan 18, 2016 at 10:36 AM, alex
To get Nat's method to work for me, I used the following:
while (dijkstra_object) { dijkstra_object(); }
Where dijkstra_object is a boost::coroutines::asymmetric_coroutine<Vertex>::pull_type.
It was somehow disappointing that I couldn't simply do:
while (dijkstra_object() ) { }
You could instead, as in Daniel's example program a couple posts back, use a range 'for' over the dijkstra_object.
I can see in the documentation that the constructor enters the coroutine-function, but it is not clear to me why. Would it not have been neater if this was avoided?
With a pull_type coroutine, the expectation is that every time it suspends (until it exits), it has produced a value for its consumer. If the coroutine constructor didn't enter the coroutine-function, the first invocation would have to be a special case. Even if that were desirable, how could you distinguish the case in which the coroutine produces zero values? That said, if you want finer-grained control over exactly when the coroutine is entered, you might consider using execution_context instead [0]. [0] http://www.boost.org/doc/libs/1_60_0/libs/context/doc/html/context/econtext....
-----Original Message-----
It was somehow disappointing that I couldn't simply do:
while (dijkstra_object() ) { }
You could instead, as in Daniel's example program a couple posts back, use a range 'for' over the dijkstra_object.
Yes, that would do perfectly.
I can see in the documentation that the constructor enters the coroutine- function, but it is not clear to me why. Would it not have been neater if
this was
avoided?
With a pull_type coroutine, the expectation is that every time it suspends (until it exits), it has produced a value for its consumer. If the coroutine constructor didn't enter the coroutine-function, the first invocation would have to be a special case.
To me it seems that the first invocation may either exit or produce a value, just as each subsequent invocation. I don't see the why the first invocation is special. I was also somehow expecting that the construction of the coroutines would be separate from there invocation, because I would assume that makes it easier to manage their appropriate execution order.
Even if that were desirable, how could you distinguish the case in which the coroutine produces zero values?
That case would just never enter the while loop.
That said, if you want finer-grained control over exactly when the coroutine is entered, you might consider using execution_context instead [0].
No, I don't want that. As a potential user I just thought it would be helpful to report where I found the interface (slightly) counterintuitive.
On Tue, Jan 19, 2016 at 6:02 AM, alex
I can see in the documentation that the constructor enters the coroutine- function, but it is not clear to me why. Would it not have been neater if this was avoided?
With a pull_type coroutine, the expectation is that every time it suspends (until it exits), it has produced a value for its consumer. If the coroutine constructor didn't enter the coroutine-function, the first invocation would have to be a special case.
To me it seems that the first invocation may either exit or produce a value, just as each subsequent invocation. I don't see the why the first invocation is special.
Okay, so let's consider this snippet: boost::coroutines::asymmetric_coroutine<int>::pull_type source(somefunc); while (source) { std::cout << source.get() << std::endl; source(); } Suppose the asymmetric_coroutine<int>::pull_type constructor did not enter somefunc(). How would the operator bool() call invoked by the while statement know whether there's a value?
Even if that were desirable, how could you distinguish the case in which the coroutine produces zero values?
That case would just never enter the while loop.
But again: how do we know?
-----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Nat Goodspeed Sent: 19 January 2016 21:25 To: boost-users@lists.boost.org Subject: Re: [Boost-users] Using Coroutines in the Visitor Pattern (use-case: BGL event visitor)
On Tue, Jan 19, 2016 at 6:02 AM, alex
wrote: I can see in the documentation that the constructor enters the coroutine- function, but it is not clear to me why. Would it not have been neater if this was avoided?
With a pull_type coroutine, the expectation is that every time it suspends (until it exits), it has produced a value for its consumer. If the coroutine constructor didn't enter the coroutine-function, the first invocation would have to be a special case.
To me it seems that the first invocation may either exit or produce a value, just as each subsequent invocation. I don't see the why the first invocation is special.
Okay, so let's consider this snippet:
boost::coroutines::asymmetric_coroutine<int>::pull_type
source(somefunc);
while (source) { std::cout << source.get() << std::endl; source(); }
Suppose the asymmetric_coroutine<int>::pull_type constructor did not enter somefunc(). How would the operator bool() call invoked by the while
statement
know whether there's a value?
OK, if the constructor would not enter the function, I would use it like this: boost::coroutines::asymmetric_coroutine<int>::pull_type source(somefunc); while (source()) { std::cout << source.get() << std::endl; } I expect the following in the evaluation of the while condition 1.The operator () will enter somefunc(). 2. somefunc() will either complete or produce an int, the state of source will reflect this 3. The operator() returns a reference to source. 4. If the function has completed the bool() function will cast it to false, if the function has returned an int the bool() function will cast it to true.
Even if that were desirable, how could you distinguish the case in which the coroutine produces zero values?
That case would just never enter the while loop.
But again: how do we know?
Operator() will cause somefunc() to complete and hence bool() will return false: source.get() will never be called.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Nat Goodspeed Sent: 19 January 2016 21:25 To: boost-users@lists.boost.org Subject: Re: [Boost-users] Using Coroutines in the Visitor Pattern (use-case: BGL event visitor)
On Tue, Jan 19, 2016 at 6:02 AM, alex
wrote: I can see in the documentation that the constructor enters the coroutine- function, but it is not clear to me why. Would it not have been neater if this was avoided?
[...]
If the coroutine constructor didn't enter the coroutine-function, the first invocation would have to be a special case.
[...]
I expect the following in the evaluation of the while condition
1.The operator () will enter somefunc().
It is dawning on me now. At the first invocation operator() should enter somefunc(), whereas at all subsequent invocations it has to re-enter somefunc(). I suppose enter and re-enter are different operations, hence the first invocation would be a special case. Thanks for indulging me.
On Wed, Jan 20, 2016 at 4:35 AM, alex
I expect the following in the evaluation of the while condition
1.The operator () will enter somefunc().
It is dawning on me now. At the first invocation operator() should enter somefunc(), whereas at all subsequent invocations it has to re-enter somefunc(). I suppose enter and re-enter are different operations, hence the first invocation would be a special case.
To be honest, I suppose your proposed API would be a viable alternative. I think part of the rationale for the current API design was to permit facile mapping to iterator operations: iterator comparison to end => operator bool() iterator operator*() => get() iterator operator++() => operator()() Perhaps, having become accustomed to the present API, I'm simply rationalizing its design. I'm sorry if I caused offense.
To be honest, I suppose your proposed API would be a viable alternative. I
think
part of the rationale for the current API design was to permit facile mapping to iterator operations:
iterator comparison to end => operator bool() iterator operator*() => get() iterator operator++() => operator()()
Perhaps, having become accustomed to the present API, I'm simply rationalizing its design. I'm sorry if I caused offense.
No offense at all. Thank you for putting forward your coroutine + visitor solution.
participants (3)
-
alex
-
Nat Goodspeed
-
Oliver Kowalke