Re: [boost] Push/pull parsers & coroutines

14 Oct 2017

      Vinnie Falco wrote:
...
On Fri, Oct 13, 2017 at 11:59 AM, Phil Endecott via Boost
<boost@lists.boost.org> wrote:
...
A "push" parser,
which invokes client callbacks as tokens are processed, is easier to
implement but harder to use as the client has to track its state
between callbacks with e.g. an explicit FSM.  On the other hand, a
"pull parser" (possibly using an iterator interface) is easier for
the client but instead now the parser may need the explicit state
tracking.
That is generally true, and especially true for XML and other
languages that have a similar structure. Specifically, that there are
opening and closing tags which determine the validity of subsequent
grammar, and have a recursive structure (like HTML).
But this is not the case for HTTP. There are no opening and closing
tags. There is no need to keep a "stack" of "open tags". It is quite
straightforward. Therefore, when designing an HTTP parser we can place
less emphasis on the style of parser and instead focus those energies
to other considerations (as I described in my previous post, regarding
the separation of concerns for stream algorithms and parser
consumers).
If you look at the Beast parser derived class, you can see that the
state is quite minimal:
template<bool isRequest, class Body, class Allocator>
    class parser
        : public basic_parser<isRequest, parser<isRequest, Body, Allocator>>
    {
        message<isRequest, Body, basic_fields<Allocator>> m_;
        typename Body::writer wr_;
        bool wr_inited_ = false;
        std::function<...> cb_h_; // for manual chunking
        std::function<...> cb_b_; // for manual chunking
        ...
You still have an explicit state machine, i.e. a state enum and a overview.html
switch statement in a loop; I'm looking at impl/basic_parser.ipp for
example.

But I don't want to dwell on this particular code.  I'm just considering,
generally, whether this style of code is soon going to look "antique" -
in the way that 15-year-old code full of explicit new and delete looks
antediluvian now that we're all using smart pointers.

I think it's clear that often coroutines can make the code simpler to
write and/or easier to use.  The question is what do we lose.  The
issue of generator<T> providing only input iterators is the most
significant issue I've spotted so far.  This is in some way related
to the whole ASIO "buffer sequence" thing; the code I posted before
read into contiguous buffers, but that was lost before the downstream
code saw it, so it couldn't hope to optimise with e.g. word-sized
copies or compares.  Maybe this could be fixed with some sort of segmented
iterator, or something other than generator<T> as the coroutine type,
or something.  Or maybe it's unfixable.

Do other languages have anything to teach us about this?  What do
users of Boost.Coroutine think?

Regards, Phil.