Vinnie Falco wrote:
On Fri, Oct 13, 2017 at 11:59 AM, Phil Endecott via Boost
wrote: A "push" parser, which invokes client callbacks as tokens are processed, is easier to implement but harder to use as the client has to track its state between callbacks with e.g. an explicit FSM. On the other hand, a "pull parser" (possibly using an iterator interface) is easier for the client but instead now the parser may need the explicit state tracking.
That is generally true, and especially true for XML and other languages that have a similar structure. Specifically, that there are opening and closing tags which determine the validity of subsequent grammar, and have a recursive structure (like HTML).
But this is not the case for HTTP. There are no opening and closing tags. There is no need to keep a "stack" of "open tags". It is quite straightforward. Therefore, when designing an HTTP parser we can place less emphasis on the style of parser and instead focus those energies to other considerations (as I described in my previous post, regarding the separation of concerns for stream algorithms and parser consumers).
If you look at the Beast parser derived class, you can see that the state is quite minimal:
template
class parser : public basic_parser > { message m_; typename Body::writer wr_; bool wr_inited_ = false; std::function<...> cb_h_; // for manual chunking std::function<...> cb_b_; // for manual chunking ...
You still have an explicit state machine, i.e. a state enum and a overview.html switch statement in a loop; I'm looking at impl/basic_parser.ipp for example. But I don't want to dwell on this particular code. I'm just considering, generally, whether this style of code is soon going to look "antique" - in the way that 15-year-old code full of explicit new and delete looks antediluvian now that we're all using smart pointers. I think it's clear that often coroutines can make the code simpler to write and/or easier to use. The question is what do we lose. The issue of generator<T> providing only input iterators is the most significant issue I've spotted so far. This is in some way related to the whole ASIO "buffer sequence" thing; the code I posted before read into contiguous buffers, but that was lost before the downstream code saw it, so it couldn't hope to optimise with e.g. word-sized copies or compares. Maybe this could be fixed with some sort of segmented iterator, or something other than generator<T> as the coroutine type, or something. Or maybe it's unfixable. Do other languages have anything to teach us about this? What do users of Boost.Coroutine think? Regards, Phil.