Re: [boost] Boost.HTTPKit, a new library from the makers of Beast!
On Sun, Oct 8, 2017 at 5:38 PM, Vinícius dos Santos Oliveira
Now, moving on... given that you have __not__ answered how your parser's design[1] compares to the parser I've developed
I'll try to provide more clarity. `beast::basic_parser` is designed
with standardization in mind, as I intent to eventually propose Beast
for the standard library. Therefore, I have made the interface as
simple as possible, exposing only the minimum required necessary to
achieve these goals which the majority of users want:
* Read the header first, if desired
* Feed Asio style buffer sequences into the parser
* Set independent limits on the number of header and body octets
* Optional fine-grained receipt of chunked body data and metadata
The design of basic_parser (the use of CRTP in particular) is meant to
support the case where a user implements their own Fields container,
or if they want a bit more custom handling of the fields (for example,
to avoid storing them).
Like all design choices, tradeoffs are made. The details of parsing
are exposed only to the derived class. Complexities are hidden from
the public-facing interface of `basic_parser`. Implementing a stream
algorithm that operates on the parser is a straightforward process:
template<
class SyncReadStream, class DynamicBuffer,
bool isRequest, class Derived>
std::size_t read(SyncReadStream& stream, DynamicBuffer& buffer,
basic_parser
[1] at most it hides behind a “multiple people/stakeholders agree with me” __shield__
There's no hiding going on here. One can only measure the relative success of designs based on the feedback from users. They opened issues, and I addressed their use-cases, sometimes with a considerable amount of iteration and back-and-forth as you have seen in the GitHub issues quoted in the previous message. Now, I don't know if the sampling of users that have participated in Beast's design are representative of the entire C++ community. However, I do know one thing - if GitHub stars are any measure of the sample size of the participants, then Beast is off to a good start. Here's a graph showing the number stars over time received by boostorg/beast, tufao, and Boost.Http (the last 2 libraries having you as the author I believe): http://www.timqian.com/star-history/#boostorg/beast&BoostGSoC14/boost.http&vinipsmaker/tufao (Note that the HTTP+WebSocket version of Beast was released in May 2016). We need to be careful interpreting results like this of course so perhaps we should look at different metrics. Here are the links to the number of closed issues for Beast, tufao, and Boost.Http: 502 Closed issues in Beast: https://github.com/boostorg/beast/issues?q=is%3Aissue+is%3Aclosed 38 Closed issues in tufao https://github.com/vinipsmaker/tufao/issues?q=is%3Aissue+is%3Aclosed 13 Closed issues in Boost.HTTP, from 6 unique users not including the author https://github.com/BoostGSoC14/boost.http/issues?q=is%3Aissue+is%3Aclosed Again we have to be careful interpreting results like this. But it sure looks like there is a lot of user participation in Beast. If approval from a large number of stakeholders is not a compelling design motivator then what is?
This tutorial is full of “design implications” blocks where I take the time to dissect what it means to go with each choice.
Thus far, no one has asked for more fine grained access to incoming HTTP tokens in the manner of `code::method` and `code::request_target`. If this becomes something that users consistently ask for, it can be done by changing the requirements for the derived class for `basic_parser`. This way, details about HTTP parsing which most people don't care about will not leak into the beast::http:: namespace. Such a change would not affect existing stream algorithms on parsers. Thanks
2017-10-09 22:18 GMT-03:00 Vinnie Falco via Boost
I'll try to provide more clarity. `beast::basic_parser` is designed with standardization in mind, as I intent to eventually propose Beast for the standard library.
A standard library is a library where we cannot have the luxury to make mistakes. If you make a mistake, the API will carry the technical debt forever. I have made the interface as [...]
[...] tradeoffs are made. [...] [...] One can only measure the relative success of designs based on the feedback from users. They opened issues [...]
So... it solves the problem for N users... therefore it'll solve the problem for N + Z users too? Do you see how this looks like to me? If I take the guess that the problem is the inductive reasoning that I criticized all along in the previous tutorial I've linked, would you think that this would be a wild guess or a reasonable guess? Can you understand why I see the situation as such? However, I'll pretend that I've chosen the optimistic vision this time. I'll pretend that you've started to grasp the critics and you're here explaining how you've developed Boost.Beast parser (i.e. the approach used to tackle development) to justify your design. For a moment, let's ignore the “justify the design” and focus on the approach to the development (because the way you presented is not a discussion of design). What you're doing is applying a heuristic. I do know and use this heuristic (directly and indirectly). But there are two points that I want to add here. The first point is: do not be a slave to a single one heuristic. I also mentioned one heuristic in the tutorial I've linked previously, the Occam’s razor https://vinipsmaker.github.io/asiohttpserver/#_footnote_8 When you say “we need to be careful interpreting results”... how does someone who lacks the tools in the cognitive repertory will interpret these results? Can you describe/use me your other tools? I'm not the one who will praise you if you answer quickly. Take your time (people rarely listen to this advice). The second point is: here, you need to go *beyond* the heuristics. This is a point that depends entirely on you. I cannot explain you how you go beyond the heuristics. You've just got to do it. I'll try a new approach here. Given some of the ideas weren't well received by you, I took the liberty to convert the Tufão project that you've mentioned to use the Boost.Beast parser: https://github.com/vinipsmaker/tufao/commit/56e27d3b77d617ad1b4aea377f993592... Would you say you like the new result better? Would you say that I've misused your parser to favour my approach? How would you have done it in this case? Would you go beyond and accept the idea that the spaghetti effect is inherent to the callback-based approach of push parsers? Or maybe would you say that the spaghetti effect is small and acceptable here? What do you think about the following links? - https://github.com/google/pulldown-cmark - https://www.ncameron.org/blog/macros-and-syntax-extensions-and-compiler-plug... - https://github.com/Marwes/combine Let's try yet another way to approach the problem (a different perspective again). We talked about heuristics and I begged you to go beyond. But there is one vision/perspective of this problem that may help you. Reason + logic: wouldn't you agree that `basic_parser<T>::eager(false)` is just a hacky way to implement a pull parser? Why? Remember to pay attention to the *why*. I'm not interested in the yes/no. [...] no one has asked for [...] If this becomes something that users
consistently ask for, it can be done by [...]
“if you do not give a try to enter in the general problem and insist on a *myopic vision*” — https://vinipsmaker.github.io/asiohttpserver/#_implementing_ boost_beast_parser_interface Why do you only solve the problem that is immediately in front of you? You're not serious about standardization (consult my comment on technical debt right at the beginning of this email) if you're unwilling/afraid to let go of this modus operandi. There is a bright mind hiding behind this robot. Let us see it. There is a talk that is pure design, “understanding parser combinators - a deep dive”[1]. How do you use *any* of your currently presented lens/perspectives to judge the ideas of this talk? “Look, the idea of composability here is wrong because he hasn't filled a project with 100 issues at Github”. It's just pathetic. I'm not entering in this leads-to-nowhere line of reasoning, so please just stop. What would you use as an example of design discussion? The tutorial I've linked previously[2] or the conversation we're at now? How wrong was I... I just thought you were ignoring the ideas all along. How wrong was I... You just don't understand the subject at hand. “There are these two young fish swimming along, and they happen to meet an older fish swimming the other way, who nods at them and says, "Morning, boys, how's the water?" And the two young fish swim on for a bit, and then eventually one of them looks over at the other and goes, "What the hell is water?"” You don't know water, do you? Nor inductive reasoning, nor myopic vision, nor Occam’s razor and the list goes on... It's so frustrating... you can't imagine. You just have no idea. Sorry about the trouble caused so far. I'll meditate on this matter and try to learn something out of this episode. “A cat approaches a dog and says “Meow.” The dog looks confused. The cat repeats, “Meow!” The dog still looks confused. The cat repeats, more emphatically, “MEEOW!!!” Finally, the dog ventures, “Bow-wow?” The cat stalks away indignantly, thinking “Dumb dog!”” Thank you for the useful research that you've done. I'll surely use it (and learn from it). [1] https://vimeo.com/171704565 [2] a tutorial that I've done in a rush and I'm not very proud of, but it still touches design -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/
On Thu, Oct 12, 2017 at 6:47 AM, Vinícius dos Santos Oliveira via Boost < boost@lists.boost.org> wrote:
2017-10-09 22:18 GMT-03:00 Vinnie Falco via Boost
:
[...] spaghetti effect is inherent to the callback-based approach of push parsers?
Or maybe would you say that the spaghetti effect is small and acceptable
here?
I've done quite of bit of XML processing in the past, using both PUSH (i.e. SAX) and PULL parsers, and also wrote my own JSON PUSH and PULL parsers, and I *much* prefer PULL parsers. They are simpler to use, and lead to nicer client code, that's easier to read and follow. Not sure it's relevant to the discussion here, but just in case I thought I'd share that perspective. --DD
On 12-10-17 06:47, Vinícius dos Santos Oliveira via Boost wrote:
What do you think about the following links?
What is the relevance of the links? They're extremely broad and general. If you are suggesting that the implementation of the parser interface should use parser combinators/generators, for sure. That is not necessarily an interface design concern. What _specific_ interface design concerns do you have in mind when linking these kind of general purpose libraries/approaches? Are you proposing a parser combinator library for Boost or the standard? (Would Spirit X3 fit the bill?). On 12-10-17 06:47, Vinícius dos Santos Oliveira via Boost wrote:
I took the liberty to convert the Tufão project that you've mentioned to use the Boost.Beast parser: https://github.com/vinipsmaker/tufao/commit/56e27d3b77d617ad1b4aea377f993592...
That is nice and tangible. Let's focus on concrete shortcomings, relevant to the library interface. Seth
2017-10-12 5:38 GMT-03:00 Seth via Boost
On 12-10-17 06:47, Vinícius dos Santos Oliveira via Boost wrote:
What do you think about the following links?
What is the relevance of the links? They're extremely broad and general. If you are suggesting that the implementation of the parser interface should use parser combinators/generators, for sure. That is not necessarily an interface design concern.
From my point of view, it was boring that the relevance was even missed to start with. Let's do this, I won't give hints of how the future should look
I coded a new example for you. It took me some time because I wanted to code other features. There was no serious test for the response parser (aside from the test suite), so this was one of the things I've coded. Anyway, in the tutorial, I've mentioned the problem of composability, a chain of transformation. This is an architecture that resembles GStreamer filter elements. It also remember iterators/ranges. It's also an architecture which resembles the "is to base our macros on tokens rather than AST nodes" decision from recent Rust developments< https://www.ncameron.org/blog/macros-and-syntax-extensions-and-compiler-plugins-where-are-we-at/>. It's _not_ a toy, it's a popular solution. I'm not _forcing_ this design (you can use the parser ignoring this possibility). I'm merely pointing that this design might be useful to the user and one of the models is prohibitive on such option. Here you can see how you can wrap the parser to force "atomic header fields" and just ignore the possibility where these tokens would be delivered separately (you could do this for any tokens): https://github.com/BoostGSoC14/boost.http/commit/9908fe06d4b2364ce18ea9b4162... Pretty powerful change. In the same example, you still can provide your own wrappers and have yet another transformation happening behind the scenes. I've coded an example: https://github.com/BoostGSoC14/boost.http/blob/master/example/spawn-settings... I've listed a few applications in the top comment of the example. They are not artificial possibilities. Some were based on user comments (e.g. I want to only store a specific set of headers, I want to reject requests sooner...). The list could go on and on, so it's useless to try to guess what the user would want. This concept is incredibly powerful and by just allowing my higher layer to have the parser customized, I've got plenty of possibilities. The same would happen to _anyone_ using this parser. And if you ignore completely what higher interface the user is interested in designing... the parser is easy to use. And the parser just got easier with the changes I've pushed a few moments ago (and they were predicted). You ask me what is the relevance of these links. In this discussion, I had to repeat information over and over using different strategies. At least, I guess you won't ask the relevance of the two other links, which are direct examples on how to design parsers. like, so neophobia shouldn't attack anyone (but neophobia is a bad term given I'm only talking about old concepts). I'll come back next month with another nice thing. What _specific_ interface design concerns do you have in mind when
linking these kind of general purpose libraries/approaches? Are you proposing a parser combinator library for Boost or the standard? (Would Spirit X3 fit the bill?).
For now? These: https://vinipsmaker.github.io/asiohttpserver/#_implementing_ boost_beast_parser_interface Once we change the road, I can proceed to discuss how the pull parser should look like. On 12-10-17 06:47, Vinícius dos Santos Oliveira via Boost wrote:
I took the liberty to convert the Tufão project that you've mentioned to use the Boost.Beast parser: https://github.com/vinipsmaker/tufao/commit/56e27d3b77d617ad 1b4aea377f993592bc2c0d77
That is nice and tangible. Let's focus on concrete shortcomings, relevant to the library interface.
What is the whole tutorial I've written? Full of “design implication” blocks. Thanks. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/
On Wed, Oct 11, 2017 at 9:47 PM, Vinícius dos Santos Oliveira
I took the liberty to convert the Tufão project that you've mentioned to use the Boost.Beast parser: https://github.com/vinipsmaker/tufao/commit/56e27d3b77d617ad1b4aea377f993592...
Would you say you like the new result better?
It seems pretty reasonable to me.
Would you say that I've misused your parser to favour my approach? How would you have done it in this case?
Misused? I don't think so. The only meaningful change I would make is that I would have simply called basic_parser::is_keep_alive() instead of re-implementing the logic for interpreting the Connection header.
Would you go beyond and accept the idea that the spaghetti effect is inherent to the callback-based approach of push parsers?
This is where we are venturing into the world of opinion. It seems
like you have a general aversion to callbacks. But there is a reason
Beast's parser is written this way. Recognize that there are two
primary consumers of the parser:
1. Stream algorithms such as beast::http::read_some
2. Consumers of structured HTTP elements (e.g. fields)
The Beast design separates these concerns. Public member functions of
`basic_parser` provide the interface needed for stream algorithms,
while calls to the derived class provide the structured HTTP elements.
I don't think it is a good idea to combine these into one interface,
which you have done in your parser. The reason is that this
unnecessary coupling pointlessly complicates the writing of the stream
algorithm. Anyone who wants to write an algorithm to feed the parser
from some source of incoming bytes now has to care about tokens. This
is evident from your documentation:
http://boostgsoc14.github.io/boost.http/#parsing_tutorial1
In your example you declare a class `my_socket_consumer`. It has a
single function `on_socket_callback` which is called repeatedly with
incoming data. Not shown in your example is the stream algorithm (the
function which interacts with the actual socket to retrieve the data).
However, we know that this stream algorithm must be aware of the
concrete type `my_socket_consumer` and that it needs to call
`on_socket_callback` with an `asio::buffer`. A signature for this
stream algorithm might look like this:
template<class SyncReadStream>
void read(SyncReadStream& stream, my_socket_consumer& consumer);
Observe that this stream algorithm can only ever work with that
specific consumer type. In your example, `my_socket_consumer` handles
HTTP requests. Therefore, this stream algorithm can now only handle
HTTP requests. In order to receive a response, a new stream algorithm
must be written. Compare this to the equivalent signature of a Beast
styled stream algorithm:
template
Dear All,
This is related to the ongoing discussion of the Beast HTTP parser.
I have been thinking in general about how best to implement parser
APIs in modern and future C++. Specifically, I've been wondering
whether the imminent arrival of low-overhead coroutines ought to
change best practice for this sort of interface.
In the past, I have found that there is a trade-off between parser
implementation complexity and client code complexity. A "push" parser,
which invokes client callbacks as tokens are processed, is easier to
implement but harder to use as the client has to track its state
between callbacks with e.g. an explicit FSM. On the other hand, a
"pull parser" (possibly using an iterator interface) is easier for
the client but instead now the parser may need the explicit state
tracking.
Now, with stackless coroutines due "real soon now", we can avoid
needing explicit state on either side. In the parser we can
co_yield tokens as they are processed and in the client we can
consume them using input iterators. The use of co-routines doesn't
need to be explicit in the API; the parser can be said to return a
range<T>, and then return a generator<T>.
Here's a very very rough sketch of what I have in mind, for the case
of HTTP header parsing; note that I don't even have a compiler that
supports coroutines yet so this is far from real code:
generator<char> read_input(int fd)
{
char buf[4096];
while (1) {
int r = ::read(fd,buf,4096);
if (r == 0) return;
for (int i = 0; i < r; ++i) {
co_yield buf[i];
}
}
}
template <typename INPUT_RANGE>
generator< pair
On Fri, Oct 13, 2017 at 11:59 AM, Phil Endecott via Boost
Dear All, A "push" parser, which invokes client callbacks as tokens are processed, is easier to implement but harder to use as the client has to track its state between callbacks with e.g. an explicit FSM. On the other hand, a "pull parser" (possibly using an iterator interface) is easier for the client but instead now the parser may need the explicit state tracking.
That is generally true, and especially true for XML and other
languages that have a similar structure. Specifically, that there are
opening and closing tags which determine the validity of subsequent
grammar, and have a recursive structure (like HTML).
But this is not the case for HTTP. There are no opening and closing
tags. There is no need to keep a "stack" of "open tags". It is quite
straightforward. Therefore, when designing an HTTP parser we can place
less emphasis on the style of parser and instead focus those energies
to other considerations (as I described in my previous post, regarding
the separation of concerns for stream algorithms and parser
consumers).
If you look at the Beast parser derived class, you can see that the
state is quite minimal:
template
Here's a very very rough sketch of what I have in mind, for the case of HTTP header parsing; note that I don't even have a compiler that supports coroutines yet so this is far from real code:
I think it is great that you're providing an example but you have chosen the most simple, regular part of HTTP which is the headers. I suspect that if you try to use the iterator model for the start-line (which is different for requests and responses) and then try to express the message body using iterators you will run into considerable difficulty coming up with a design that is elegant and feature-rich. Especially when you consider the need to transform the chunk-encoding while providing the metadata to the caller. I know this because I went through many iterations before settling on what is in Beast currently. Thanks
Vinnie Falco wrote:
On Fri, Oct 13, 2017 at 11:59 AM, Phil Endecott via Boost
wrote: A "push" parser, which invokes client callbacks as tokens are processed, is easier to implement but harder to use as the client has to track its state between callbacks with e.g. an explicit FSM. On the other hand, a "pull parser" (possibly using an iterator interface) is easier for the client but instead now the parser may need the explicit state tracking.
That is generally true, and especially true for XML and other languages that have a similar structure. Specifically, that there are opening and closing tags which determine the validity of subsequent grammar, and have a recursive structure (like HTML).
But this is not the case for HTTP. There are no opening and closing tags. There is no need to keep a "stack" of "open tags". It is quite straightforward. Therefore, when designing an HTTP parser we can place less emphasis on the style of parser and instead focus those energies to other considerations (as I described in my previous post, regarding the separation of concerns for stream algorithms and parser consumers).
If you look at the Beast parser derived class, you can see that the state is quite minimal:
template
class parser : public basic_parser > { message m_; typename Body::writer wr_; bool wr_inited_ = false; std::function<...> cb_h_; // for manual chunking std::function<...> cb_b_; // for manual chunking ...
You still have an explicit state machine, i.e. a state enum and a overview.html switch statement in a loop; I'm looking at impl/basic_parser.ipp for example. But I don't want to dwell on this particular code. I'm just considering, generally, whether this style of code is soon going to look "antique" - in the way that 15-year-old code full of explicit new and delete looks antediluvian now that we're all using smart pointers. I think it's clear that often coroutines can make the code simpler to write and/or easier to use. The question is what do we lose. The issue of generator<T> providing only input iterators is the most significant issue I've spotted so far. This is in some way related to the whole ASIO "buffer sequence" thing; the code I posted before read into contiguous buffers, but that was lost before the downstream code saw it, so it couldn't hope to optimise with e.g. word-sized copies or compares. Maybe this could be fixed with some sort of segmented iterator, or something other than generator<T> as the coroutine type, or something. Or maybe it's unfixable. Do other languages have anything to teach us about this? What do users of Boost.Coroutine think? Regards, Phil.
On Sat, Oct 14, 2017 at 12:03 PM, Phil Endecott via Boost
The issue of generator<T> providing only input iterators is the most significant issue I've spotted so far. This is in some way related to the whole ASIO "buffer sequence" thing; the code I posted before read into contiguous buffers, but that was lost before the downstream code saw it, so it couldn't hope to optimise with e.g. word-sized copies or compares.
Buffer sequences are not the problem, it is that parsed HTTP data
types are heterogeneous. For example, the series of types generated
when parsing a request looks like this:
1. std::pair
2017-10-14 16:54 GMT-03:00 Vinnie Falco via Boost
Note how the collection of types presented for a header field is different from the request-line. Expressing this irregular stream of different types through an iterator interface is going to be very clumsy. Furthermore, there is metadata generated during the parse which is not easily reflected in an iterator interface.
For example, after the HTTP headers have been parsed, Beast calculates the "keep-alive" semantic as well as the disposition of the Content-Length, which may be in three states: body-to-eof, chunked, or known. The keep-alive semantics are communicated to the caller of the parser through a member function `basic_parser::is_keep_alive`:
<http://www.boost.org/doc/libs/master/libs/beast/doc/ html/beast/ref/boost__beast__http__basic_parser/is_keep_alive.html>
The result/state stays in `basic_parser`. In other words, it stays in the parser object. How is this different from any other model? -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/
2017-10-13 16:24 GMT-03:00 Vinnie Falco via Boost
<https://github.com/boostorg/beast/blob/f09b2d3e1c9d383e5d0f 57b1bf889568cf27c39f/include/boost/beast/http/parser.hpp#L45>
Callbacks don't need to store state used by subsequent callbacks to interpret the incoming structured HTTP data, because HTTP is simple compared to XML or HTML.
Half-truth. Indeed, there is no need to store state to interpret incoming structured HTTP data, but we still may need to store state between one call and another. This was just the case in the previous example I gave you. Given the nature of the project Tufão using Qt, I have this thing called "safe signals": http://vinipsmaker.github.io/tufao/ref/1.x/safe-signal.html This safe signal of mine forbid me to access the object after I emit a signal. So I have to parse the whole received data before emitting any signal. What happens is... in the example I gave earlier, Boost.Beast parser will force me to have state inside the object itself: https://github.com/vinipsmaker/tufao/blob/56e27d3b77d617ad1b4aea377f9935 92bc2c0d77/src/httpserverrequest.cpp#L134 If you compare to the usage of Boost.Http parser, it's different: https://github.com/vinipsmaker/tufao/blob/1d7a943e4f6aae2284045f94e2b821 4a142dea9a/src/httpserverrequest.cpp#L135 (a local variable that only exists when it needs to) You should be more humble[1] about the user needs and wants or you'll limit the potential usefulness of your library. Therefore, when designing an HTTP parser we can place
less emphasis on the style of parser and instead focus those energies to other considerations
My project Tufão couldn't care less about the other stuff you grab from ASIO that you're trying to convert into the focus of the discussion. It's a Qt project and Qt networking is used. That's just the motivation for your separate submission (as users requested abstractions to parse HTTP without ASIO from the review comments). And I still find disappointing that I still need to show specific/concrete cases. [1] https://en.wikipedia.org/wiki/There_are_known_knowns -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/
On 13-10-17 20:59, Phil Endecott via Boost wrote:
Specifically, I've been wondering whether the imminent arrival of low-overhead coroutines ought to change best practice for this sort of interface.
That's nice, but it can't inform the design of a library that exists now. Of course, the interface would be best served if it didn't exclude better¹ options in the future. ¹ coroutines are not zero cost On 13-10-17 20:59, Phil Endecott via Boost wrote:
Now, with stackless coroutines due "real soon now", we can avoid needing explicit state on either side.
Coros have explicit state but with syntactic sugar. The syntactic sugar in this case has runtime overhead.
Questions: how efficient is this in practice?
In practice it should be profiled, but it _will_ have overhead.
Is this really simpler to write than a non-coroutine version?
In all but the most trivial cases I think it's simpler. To write.
Will all of our code use this style in the (near?) future?
How should we be writing code now so that it is compatible with this style in the future? This is the most relevant question. I applaud it being asked. I don't have the answer yet. Slightly related, in my book, may be the way in which Boost Asio caters for different async patterns (yield_context, use_future or direct handlers). Asio coded the logic into the async_result customization
Will all of our code use this style: Most definitely not (because then we'd not be using C++, the language that exists to eliminate overhead) point. (http://www.boost.org/doc/libs/1_65_1/doc/html/boost_asio/reference/async_res...) I suppose we could learn by assimilating a device like that. Seth
On Sat, Oct 14, 2017 at 8:03 AM, Seth via Boost
¹ coroutines are not zero cost
That depends. I've done some investigation into the Coroutines TS described in n4134. For coroutines whose scope is strictly limited to the calling function, they can be implemented with zero cost (no dynamic allocation and comparable assembly output). The expository code that Phil provided certainly falls into that category. Thanks
Seth wrote:
coroutines are not zero cost
In some cases they can have negative cost. See Gor Nishanov's CppCon 2015 presentation, "C++ Coroutines - a negative overhead abstraction". With coroutines, the state is essentially a program counter value which can be saved and restored with similar cost to a function call or return. When the alternative is something like a state enum and a switch statement, the coroutine is going to win. Regards, Phil.
On 14-10-17 20:04, Phil Endecott via Boost wrote:
In some cases they can have negative cost. See Gor Nishanov's CppCon 2015 presentation, "C++ Coroutines - a negative overhead abstraction".
I'm sorry I assumed from experience with Boost Coroutine only. This is indeed fantastic stuff and I had seen that particular vid. Thanks for correcting my memory, Seth
I'm excited about this subject (and ranges-TS). I believe these changes
will shape the future as how we design parsers in C++.
However, I can only focus on one project at a time. For now, this is this
C++03 parser.
2017-10-13 15:59 GMT-03:00 Phil Endecott via Boost
Dear All,
This is related to the ongoing discussion of the Beast HTTP parser. I have been thinking in general about how best to implement parser APIs in modern and future C++. Specifically, I've been wondering whether the imminent arrival of low-overhead coroutines ought to change best practice for this sort of interface.
In the past, I have found that there is a trade-off between parser implementation complexity and client code complexity. A "push" parser, which invokes client callbacks as tokens are processed, is easier to implement but harder to use as the client has to track its state between callbacks with e.g. an explicit FSM. On the other hand, a "pull parser" (possibly using an iterator interface) is easier for the client but instead now the parser may need the explicit state tracking.
Now, with stackless coroutines due "real soon now", we can avoid needing explicit state on either side. In the parser we can co_yield tokens as they are processed and in the client we can consume them using input iterators. The use of co-routines doesn't need to be explicit in the API; the parser can be said to return a range<T>, and then return a generator<T>.
Here's a very very rough sketch of what I have in mind, for the case of HTTP header parsing; note that I don't even have a compiler that supports coroutines yet so this is far from real code:
generator<char> read_input(int fd) { char buf[4096]; while (1) { int r = ::read(fd,buf,4096); if (r == 0) return; for (int i = 0; i < r; ++i) { co_yield buf[i]; } } }
template <typename INPUT_RANGE> generator< pair
> parse_header_lines(INPUT_RANGE input) { typedef INPUT_RANGE::const_iterator iter_t; iter_t i = input.begin(), e = input.end(); while (i != e) { iter_t j = std::find(i,e,':'); string k(i,j); // (That's broken, as iter_t is a single-pass input iterator. We // need to copy to the string and check for ':' at the same time. // It's trivial with a loop.) ++j; iter_t k = std::find(j,e,'\n'); string v(j,k); ++k; i = k; co_yield pair(k,v); } } void parse_http_headers(int fd) { map
headers; auto g = parse_header_lines( read_input(fd) ); for (auto h: g) { headers.insert(h); } } An "exercise for the reader" is to extend that to something that will parse headers followed by a body.
Questions: how efficient is this in practice? Is this really simpler to write than a non-coroutine version? Will all of our code use this style in the (near?) future? How should we be writing code now so that it is compatible with this style in the future?
Thanks for reading,
Phil.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman /listinfo.cgi/boost
-- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/
2017-10-12 9:45 GMT-03:00 Vinnie Falco via Boost
On Wed, Oct 11, 2017 at 9:47 PM, Vinícius dos Santos Oliveira
wrote: I took the liberty to convert the Tufão project that you've mentioned to use the Boost.Beast parser: https://github.com/vinipsmaker/tufao/commit/ 56e27d3b77d617ad1b4aea377f993592bc2c0d77
Would you say you like the new result better?
It seems pretty reasonable to me.
Would you say that I've misused your parser to favour my approach? How would you have done it in this case?
Misused? I don't think so. The only meaningful change I would make is that I would have simply called basic_parser::is_keep_alive() instead of re-implementing the logic for interpreting the Connection header.
Would you go beyond and accept the idea that the spaghetti effect is inherent to the callback-based approach of push parsers?
This is where we are venturing into the world of opinion. It seems like you have a general aversion to callbacks. But there is a reason Beast's parser is written this way. Recognize that there are two primary consumers of the parser:
1. Stream algorithms such as beast::http::read_some
Such as async_read_some, which are useless if you want to use other networking APIs (which is _the_ reason for the desire of separate parser API to start with). 2. Consumers of structured HTTP elements (e.g. fields)
The Beast design separates these concerns. Public member functions of `basic_parser` provide the interface needed for stream algorithms, while calls to the derived class provide the structured HTTP elements. I don't think it is a good idea to combine these into one interface
Point 1 is _useless_ if you are providing a _parser_ API. You don't know which networking API the user will use. It's that simple. There is no point 1. which you have done in your parser. The reason is that this
unnecessary coupling pointlessly complicates the writing of the stream algorithm.
Not coupling. Point 1 is not even there. Why would you design a parser with a specific networking API in mind? Message consuming/producing is the work of a higher layer. Anyone who wants to write an algorithm to feed the parser
from some source of incoming bytes now has to care about tokens. This is evident from your documentation:
http://boostgsoc14.github.io/boost.http/#parsing_tutorial1
In your example you declare a class `my_socket_consumer`. It has a single function `on_socket_callback` which is called repeatedly with incoming data. Not shown in your example is the stream algorithm (the function which interacts with the actual socket to retrieve the data). However, we know that this stream algorithm must be aware of the concrete type `my_socket_consumer` and that it needs to call `on_socket_callback` with an `asio::buffer`. A signature for this stream algorithm might look like this:
template<class SyncReadStream> void read(SyncReadStream& stream, my_socket_consumer& consumer);
The stream algorithms you provide were of no use to convert Tufão to use your parser. And I don't recall writing stream algorithms to use the parser. Observe that this stream algorithm can only ever work with that
specific consumer type. In your example, `my_socket_consumer` handles HTTP requests. Therefore, this stream algorithm can now only handle HTTP requests.
Same function to handle request and responses: https://github.com/BoostGSoC14/boost.http/blob/5ea65c64467e689cc9b67b8e66da6... There is a little of TMP in the lines above, but so do you have to use templates if you want to generalize your algorithm over the `isRequest` template parameter. In order to receive a response, a new stream algorithm
must be written. Compare this to the equivalent signature of a Beast styled stream algorithm:
template
void read(SyncReadStream& stream, basic_parser & parser); This allows an author to create a stream algorithm which works not just for requests which store their data as data members in a class (`my_socket_consumer`) but for any parser, thanks to the CRTP design.
What can your stream algorithms do besides feed data to the parser + consumer? Just a few messages ago I've showed concrete examples of how powerful the other model is. In the other model, you could even inject new tokens (in a virtual way, without the need to read the stream twice and inject new HTTP data at the appropriate points). For example, if I create a parser by subclassing
`beast::http::basic_parser` with an implementation that discards headers I don't care about, then it will work with the stream algorithm described above without requiring modification to that algorithm.
And if you change your networking API, the stream algorithm becomes pretty useless. This element that I've coded, in the other side, will work okay in any "stream model" that you wish: https://github.com/BoostGSoC14/boost.http/blob/9908fe06d4b2364ce18ea9b416264... It is interesting to note that your `my_socket_consumer` is
roughly equivalent to the beast::http::parser class (which is derived from beast::http::basic_parser):
<https://github.com/boostorg/beast/blob/f09b2d3e1c9d383e5d0f57b1bf8895 68cf27c39f/include/boost/beast/http/parser.hpp#L45>
Both of these classes store incoming structured HTTP elements in a container for holding HTTP message data. However note that unlike `beast::http::parser`, `my_socket_consumer` also has to know about buffers:
void on_socket_callback(asio::buffer data) { .... buffer.push_back(data); request_reader.set_buffer(buffer);
It might not be evident to casual readers but the implementation of `my_socket_consumer` has to know that the parser needs the serialized version of the message to be entirely contained in a single contiguous buffer.
Nope. Go read again. In my opinion this is a design flaw because it does not
enforce a separation of concerns. The handling of structured HTTP elements should not concern itself with the need to assemble the incoming message into a single contiguous buffer; that responsibility lies with the stream algorithm.
The design decision in Beast is to keep the interfaces used by stream algorithms separate from the interface used by consumers of HTTP tokens. Furthermore the design creates a standard interface so that stream algorithms can work with any instance of `basic_parser`, including both requests and responses, and for any user-defined derived class.
http://www.boost.org/doc/libs/develop/doc/html/boost_asio/reference/AsyncRea... Stream concepts that borrow ASIO concepts... that's so lame. You think the users want access to a parser to implement an ASIO API? They will just use Boost.Beast instead using a parser directly. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/
participants (5)
-
Dominique Devienne
-
Phil Endecott
-
Seth
-
Vinnie Falco
-
Vinícius dos Santos Oliveira