I'm excited about this subject (and ranges-TS). I believe these changes
will shape the future as how we design parsers in C++.
However, I can only focus on one project at a time. For now, this is this
C++03 parser.
2017-10-13 15:59 GMT-03:00 Phil Endecott via Boost
Dear All,
This is related to the ongoing discussion of the Beast HTTP parser. I have been thinking in general about how best to implement parser APIs in modern and future C++. Specifically, I've been wondering whether the imminent arrival of low-overhead coroutines ought to change best practice for this sort of interface.
In the past, I have found that there is a trade-off between parser implementation complexity and client code complexity. A "push" parser, which invokes client callbacks as tokens are processed, is easier to implement but harder to use as the client has to track its state between callbacks with e.g. an explicit FSM. On the other hand, a "pull parser" (possibly using an iterator interface) is easier for the client but instead now the parser may need the explicit state tracking.
Now, with stackless coroutines due "real soon now", we can avoid needing explicit state on either side. In the parser we can co_yield tokens as they are processed and in the client we can consume them using input iterators. The use of co-routines doesn't need to be explicit in the API; the parser can be said to return a range<T>, and then return a generator<T>.
Here's a very very rough sketch of what I have in mind, for the case of HTTP header parsing; note that I don't even have a compiler that supports coroutines yet so this is far from real code:
generator<char> read_input(int fd) { char buf[4096]; while (1) { int r = ::read(fd,buf,4096); if (r == 0) return; for (int i = 0; i < r; ++i) { co_yield buf[i]; } } }
template <typename INPUT_RANGE> generator< pair
> parse_header_lines(INPUT_RANGE input) { typedef INPUT_RANGE::const_iterator iter_t; iter_t i = input.begin(), e = input.end(); while (i != e) { iter_t j = std::find(i,e,':'); string k(i,j); // (That's broken, as iter_t is a single-pass input iterator. We // need to copy to the string and check for ':' at the same time. // It's trivial with a loop.) ++j; iter_t k = std::find(j,e,'\n'); string v(j,k); ++k; i = k; co_yield pair(k,v); } } void parse_http_headers(int fd) { map
headers; auto g = parse_header_lines( read_input(fd) ); for (auto h: g) { headers.insert(h); } } An "exercise for the reader" is to extend that to something that will parse headers followed by a body.
Questions: how efficient is this in practice? Is this really simpler to write than a non-coroutine version? Will all of our code use this style in the (near?) future? How should we be writing code now so that it is compatible with this style in the future?
Thanks for reading,
Phil.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman /listinfo.cgi/boost
-- VinÃcius dos Santos Oliveira https://vinipsmaker.github.io/