[boost] Push/pull parsers & coroutines (Was: Boost.HTTPKit, a new library from the makers of Beast!)

13 Oct 2017

      Dear All,

This is related to the ongoing discussion of the Beast HTTP parser.  
I have been thinking in general about how best to implement parser 
APIs in modern and future C++.  Specifically, I've been wondering 
whether the imminent arrival of low-overhead coroutines ought to 
change best practice for this sort of interface.

In the past, I have found that there is a trade-off between parser 
implementation complexity and client code complexity.  A "push" parser, 
which invokes client callbacks as tokens are processed, is easier to 
implement but harder to use as the client has to track its state 
between callbacks with e.g. an explicit FSM.  On the other hand, a 
"pull parser" (possibly using an iterator interface) is easier for 
the client but instead now the parser may need the explicit state 
tracking.

Now, with stackless coroutines due "real soon now", we can avoid 
needing explicit state on either side.  In the parser we can 
co_yield tokens as they are processed and in the client we can 
consume them using input iterators.  The use of co-routines doesn't 
need to be explicit in the API; the parser can be said to return a 
range<T>, and then return a generator<T>.

Here's a very very rough sketch of what I have in mind, for the case 
of HTTP header parsing; note that I don't even have a compiler that 
supports coroutines yet so this is far from real code:

generator<char> read_input(int fd)
{
  char buf[4096];
  while (1) {
    int r = ::read(fd,buf,4096);
    if (r == 0) return;
    for (int i = 0; i < r; ++i) {
      co_yield buf[i];
    }
  }
}

template <typename INPUT_RANGE>
generator< pair<string,string> > parse_header_lines(INPUT_RANGE input)
{
  typedef INPUT_RANGE::const_iterator iter_t;
  iter_t i = input.begin(), e = input.end();
  while (i != e) {
    iter_t j = std::find(i,e,':');
    string k(i,j);
    // (That's broken, as iter_t is a single-pass input iterator. We 
    // need to copy to the string and check for ':' at the same time. 
    // It's trivial with a loop.)
    ++j;
    iter_t k = std::find(j,e,'\n');
    string v(j,k);
    ++k;
    i = k;
    co_yield pair(k,v);
  }
}

void parse_http_headers(int fd)
{
  map<string,string> headers;
  auto g = parse_header_lines( read_input(fd) );
  for (auto h: g) {
    headers.insert(h);
  }
}

An "exercise for the reader" is to extend that to something that will 
parse headers followed by a body.

Questions: how efficient is this in practice?  Is this really simpler to 
write than a non-coroutine version?  Will all of our code use this style 
in the (near?) future?  How should we be writing code now so that it is 
compatible with this style in the future?

Thanks for reading,

Phil.

[boost] Push/pull parsers & coroutines (Was: Boost.HTTPKit, a new library from the makers of Beast!)

Phil Endecott