Re: [Boost-users] Using Coroutines in the Visitor Pattern (use-case: BGL event visitor)

15 Jan 2016

      ...
From: Nat Goodspeed
On Thu, Jan 14, 2016 at 4:31 AM, alex <alexhighviz@hotmail.com> wrote:
...
Boost Coroutine has changed (and surely improved) a lot since then. I
would
be curious to know the computational cost of the inversion of control
using
your method. Have you tried to quantify it?
That's an interesting question. Partly, of course, it depends on what
you're comparing it to. Throwing exceptions from within a visitor, and
then reconstructing state back down to the point at which the
algorithm was last interrupted? Intuition suggests that the coroutine
approach beats that handily.
I don't think reconstructing the state should be part of the comparison,
because it is possible (with a lot of hassle) to avoid needing to do that.
...
Likewise, I'm intuitively certain that using stackful coroutines is
significantly cheaper than using threads with locks. The OS doesn't
participate in coroutine context switching at all, and since the
coroutines are simply passing control back and forth within the same
thread, no lock objects are needed.
Yes, I don't think this is the comparison to make.
...
I think you mean: what's the cost of the context switching? Oliver
Kowalke (author of the Coroutine library) made this remark in private
mail:
...
I would suggest that you use boost.coroutine2 (from branch develop
+ context from branch develop) - it is a little bit faster (38ns vs.
46ns)
than boost.coroutine.
But of course that's on particular hardware. I don't know whether you
find that answer satisfactory. If not, please clarify the question?
The more I think about it, the more I love your solution.

I think there are three use-cases to consider:

(1). The algorithm needs to be interrupted and resumed many times, e.g.
every time a node is discovered as in your example.
(2). The algorithm needs to be interrupted and resumed only a few times.
This was my original use case. I would first run the Dijkstra algorithm
until a specified set of nodes was discovered. Then resume it until a second
set of nodes was discovered.
(3). The algorithm needs to be interrupted once, and not resumed. I suppose
this is the most common case; it is the first question in the FAQ
(http://www.boost.org/doc/libs/1_60_0/libs/graph/doc/faq.html) and you can
find it recurring on the boost mailing list and stack overflow.

Why was my original coroutine solution so inefficient? My use-case was (2)
so I only needed to leave and re-enter the coroutine a few times. However,
the "inversion of control" meant that I left and reentered the coroutine
every time a node was discovered, so really appropriate for use-case (1). 

Now, since you bind the coroutine to the visitor and not the algorithm, it
would be very easy to modify your example to make it appropriate for
use-case (2) and also (3): just put the mSink(u) inside an if-statement.

It therefore seems to me that the FAQ #1 for BGL needs a new answer. Instead
of throwing exceptions to escape the algorithm, visitors should bind to a
push-coroutine. With the additional advantage that the algorithm can be
resumed. The comparison to make to justify this recommendation would be
between running the dijkstra_shortest_path in a try-catch block with a
visitor that can throw and running the dijkstra_shortest_path with a visitor
that is bound to a coroutine.

This would cover use case (2) and (3). I am not even sure if you really need
an experiment to answer this question or if it can be answered on
theoretical grounds. Is there a cost associated with being bound to a
coroutine? I.e. is there a cost to *potentially* switching context or is
there only a cost to *actually* switching context?

Then there is use case (1), which is the one you implemented. I would be
curious to know how it would compare to my stackless coroutine based
solution. But to be fair, your solution is small, elegant and external to
BGL, whereas mine is large, overwrought and reimplements algorithms. So the
onus would be on me to show that all this extra effort is actually
worthwhile (which I now severely doubt). Nevertheless, I think it would be
useful to compare running dijkstra_shortest_path as it currently is in BGL
to your inverted control method. I.e. simple do: while(from_a()){}. 

This would tell us what cost is associated with inverting the control and
make it easier to decide whether it is a pattern to use whenever it seems
convenient or only when strictly necessary. (with my solution the cost of
inversion is about 17% extra computation time)