Re: [boost] New Lib "Beast", HTTP + WebSocket protocols
On 04/22/2016 07:01 PM, Vinnie Falco wrote:
Main page, with links to GitHub repository, HTML documentation and benchmarks: http://vinniefalco.github.io/
Feedback welcome, the author checks email and Issues on the GitHub repository. This project has been submitted to the Boost incubator.
This looks like a nice library. You have clearly encountered the same pain points as I while working with other HTTP-based libraries, and you seem to have come up with solutions similar to those found in the Boost.Http project [1]. It may be a good idea if we joined the two projects together. Would you consider such an option? Last year, the Boost.Http project went through a formal Boost review. The summary [2] shows what the Boost community expects from a HTTP library. In order to address one of these expectations, we have a Boost Summer of Code project this year for a HTTP parser. [1] https://github.com/BoostGSoC14/boost.http [2] http://lists.boost.org/boost-announce/2015/08/0452.php
On Sun, Apr 24, 2016 at 7:09 AM, Bjorn Reese
This looks like a nice library...
Thanks for the kind words!
It may be a good idea if we joined the two projects together. Would you consider such an option?
I'm open to collaboration but also cautious. Beast.HTTP was designed
throughout to have a narrow interface. It offers only a universal
model for the HTTP message, and functions to parse, serialize,
deserialize, and send/receive on sockets. It offers both synchronous
and asynchronous functionality. And it accomplishes these goals with
an interface that resembles Boost.Asio as closely as possible, to
eliminate the learning curve for using the library.
It is this author's opinion that the more a library tries to do, the
more controversial it is and the harder it will be to get through the
boost review process. There's a strong need for simple free functions
to send and receive HTTP messages on a socket in as few lines of code
as possible. Beast.HTTP achieves this. Example code:
using namespace beast::http;
boost::asio::ip::tcp::socket sock(ios);
...
request
Last year, the Boost.Http project went through a formal Boost review. The summary [2] shows what the Boost community expects from a HTTP library.
I'll quote and address each of the points brought up in [2], the feedback from the review of Boost.Http: "Boost.Http currently only provides a server-side API, but the reviewers felt that a client-side API would be usable to more users." Beast.HTTP is completely role-agnostic, and works for building clients and servers. "There was also a recurring request for Boost.Http to be a header-only library. The HTTP parser currently used is not header-only and that is the main obstacle towards a header-only Boost.Http library." Agree 100%. Beast.HTTP also uses the NodeJS parser, and that's the only bit of code that is not header only. Thursday I started on a header-only parser, here's what we have so far (please keep in mind, this is a work in progress): https://github.com/vinniefalco/rippled/blob/my-parser/src/beast/test/http/my... This parser will not have any dependencies except the standard library so if necessary the code could be pinched for other projects. "Some of the discussion revolved around what level of abstraction would be appropriate for Boost.Http. The views ranged from wanting higher- level APIs..." Higher-level APIs are great but there is danger in offering increasing levels of abstraction. At each increase, the target audience diminishes and the chances of design choices made in the abstraction becoming inappropriate for particular use-cases goes up. No matter how strong the desire for higher level APIs, they need to be built on a foundation. Beast.HTTP provides the correct foundation; it is that which cannot be broken down further and it is that upon which everything else can be built. As such, it offers library virtue with its current feature set. "...all the way down to simply wanting a HTTP parser/generator and then leave all the socket and buffer management up to the user." By design, Beast.HTTP does not perform any buffering or socket management. Such layers can be built on top; one of the examples creates a pipelining stream, see: https://github.com/vinniefalco/Beast/blob/master/examples/http_stream.h https://github.com/vinniefalco/Beast/blob/master/examples/http_async_server.... "Boost.Http currently creates an associative array for all header fields. One reviewer explored the idea of using an incremental (push or pull) HTTP parser as part of the API to let users decide which header fields to copy and which to discard." The Beast.HTTP read algorithm allows customization of the Parser template argument (*), allowing any type that meets the requirements to be used as the implementation for parsing messages. (*) planned feature "Some reviewers also felt that HTTP/2 should be part of Boost.Http, partly because that would demonstrate the extensibility of the current design, and partly because the library would be in a stronger position to attract users if it offers more than its competitors." The IETF adopted as a goal for HTTP/2, to leave the message the same (while changing its wire format). Therefore, Beast.HTTP's message model is already HTTP/2-friendly. As for the extensibility of the design, free functions to send and receive HTTP messages are fundamentally incompatible with HTTP/2, which requires ongoing state to decompress headers and multiplex streams. And yet, we know that free functions to send and receive HTTP/1 messages are useful and sorely needed. Furthermore HTTP/2 adds features that don't make sense for HTTP/1. What would you do with the stream ID? It is not part of the message model, because it describes a transport level property (and what meaning would it have for a HTTP/1 message? or someone who is using the message object but not using any sockets at all?) What about the interface to inform the peer of a new receive window size? What about setting stream priorities/weights? How do we sort all this out? The interface for HTTP/1 should consist of a universal message model, plus functions to send and receive those messages on sockets. The interface for HTTP/2 should be a class template that wraps a socket (similar in style to beast::websocket::stream), deals in the same universal message model, and offers member functions to send and receive those messages with associated stream IDs as well as adjust HTTP/2-specific session parameters. We respectfully disagree with those reviewers who feel that a single interface should serve the needs of sending and receiving both HTTP/1 and HTTP/2 messages.
In order to address one of these expectations we have a Boost Summer of Code project this year for a HTTP parser.
Hopefully this header-only parser will be done before the summer begins. One area where Beast.HTTP could use work is in the examples. Another, bigger project would be to develop a HTTP/2 stream class template for Beast using its message model. These are areas where contributions would be most welcomed and likely the most productive use of external effort.
2016-04-24 10:14 GMT-03:00 Vinnie Falco
I'm open to collaboration but also cautious.
That's good to know.
From the rest of your reply, I assume Boost.Http isn't ready to be integrated right now.
I wish you could put more details in the documentation, so I can understand Beast better without delving into the code. Maybe I'll bug you more later once Boost.Http advances a little more. And, like Bjorn already stated, nice library. There's a strong need for simple free functions
to send and receive HTTP messages on a socket in as few lines of code as possible. Beast.HTTP achieves this. Example code:
using namespace beast::http; boost::asio::ip::tcp::socket sock(ios); ... request
req({method_t::http_get, "/", 11}); req.headers.replace("Host", "boost.org:80"); req.headers.replace("User-Agent", "Beast.HTTP"); write(sock, req);
You explained your point quite well. Could you please write an example of how do you imagine the ideal server-side API? "Boost.Http currently only provides a server-side API, but the
reviewers felt that a client-side API would be usable to more users."
Beast.HTTP is completely role-agnostic, and works for building clients and servers.
We hope to develop the client-side API after the parser project is finished. The same abstractions should be used too. However, it may be useful to add client-side only useful functions where we can have specialized get or post methods. Bjorn is wrapping libcurl into a Boost-like interface to gain knowledge: https://github.com/breese/trial.http.curl "There was also a recurring request for Boost.Http to be a header-only
library. The HTTP parser currently used is not header-only and that is the main obstacle towards a header-only Boost.Http library."
Agree 100%. Beast.HTTP also uses the NodeJS parser, and that's the only bit of code that is not header only. Thursday I started on a header-only parser, here's what we have so far (please keep in mind, this is a work in progress):
https://github.com/vinniefalco/rippled/blob/my-parser/src/beast/test/http/my...
I wasn't aware of this effort before. I took a look now that you mentioned. From what I've seen, I assume that this parser has a SAX-like interface where you interact through callbacks. The Boost.Http parser that will be developed within this summer will be a pull parser. It has a less intrusive design and can be used to build parser with SAX-like interfaces or DOM-like interfaces. It should be useful enough to also be used as is (i.e. without wrapping in another interface). It should also be easy to expose iterators using this parser and allow us to reuse all STL algorithms. You can take a look at this parser's proposal at https://gist.github.com/vinipsmaker/4998ccfacb971a0dc1bd . You can see an example of a pull parser at https://github.com/google/pulldown-cmark . -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/
On Sun, Apr 24, 2016 at 3:33 PM, Vinícius dos Santos Oliveira
I wish you could put more details in the documentation, so I can understand Beast better without delving into the code.
Agreed. The HTTP side can definitely use more documentation. Since the announcement I added a new page which explains things step by step: http://vinniefalco.github.io/beast/beast/http/usage.html
like Bjorn already stated, nice library.
Thanks, it is appreciated!
Could you please write an example of how do you imagine the ideal server-side API?
Well, being role-agnostic means the same functions are used by both
clients and servers. The only meaningful difference is that a server
will receive request objects and send response objects instead of the
other way around. The message class template distinguishes requests
and responses using a bool "isRequest" template argument. These
statements each declare a response:
http::message
...it may be useful to add client-side only useful functions where we can have specialized get or post methods.
These definitely sound like useful operations, and they should be available at some interface level. But its not clear that they they are sufficiently general purpose as to merit inclusion in a library that tries to satisfy everyone. No matter how specialized the get or post method there still needs to be a way to package the message up in a first-class type and send or receive it; Beast provides the means to do that.
Thursday I started on a header-only parser,
I wasn't aware of this effort before.
Its a minor effort, likely not worthy of fanfare.
I took a look now that you mentioned. From what I've seen, I assume that this parser has a SAX-like interface where you interact through callbacks.
I'm writing something that functions very similarly in style to the nodejs-http-parser, but updated for C++. The goal here is to eliminate a blemish on Beast, that it is not completely header-only. It inherits the zero-memory / zero-copy design of nodejs-http-parse to retain as much of its performance as possible (although, it does away with architecture-specific branch prediction hints). Some of the design improvements I'm making: * Users derive from the parser base class using CRTP (Curiously Recurring Template Pattern) * Callbacks are optional, detected through SFINAE * Callbacks are made to the derived class * The callbacks are transparent to the compiler (i.e. no function pointers), allowing inlining * No macros or dependence on preprocessor directives * No dependencies except for boost::string_ref; easily reused in other projects * Random HTTP-message generator for fuzzing
The Boost.Http parser that will be developed within this summer will be a pull parser. It has a less intrusive design and can be used to build parser with SAX-like interfaces or DOM-like interfaces. It should be useful enough to also be used as is (i.e. without wrapping in another interface).
Beast doesn't try to offer a universal or flexible parser, it just offers a parser that gets users reading messages right out of the box, and is sufficiently robust and performant as to make it a competitive choice for implementing production-class servers. Beast's HTTP message reading implementation is general purpose, callers can provide their own Parser template argument that meets the type requirements (*), permitting alternate implementation strategies. For example, keeping only the headers you care about. Or using a perfect hash function to decode the field name to an enum. This works hand in hand with customizing the Headers parameter in the message class template argument. (*) planned feature
It should also be easy to expose iterators using this parser and allow us to reuse all STL algorithms.
Since the whole thing is now templated it might be practical to revise the interface to accept a Boost.Range of chars and work with iterators in the fashion you described. That could be the subject of a future improvement.
2016-04-24 18:34 GMT-03:00 Vinnie Falco
Could you please write an example of how do you imagine the ideal server-side API?
Well, being role-agnostic means the same functions are used by both clients and servers. The only meaningful difference is that a server will receive request objects and send response objects instead of the other way around. The message class template distinguishes requests and responses using a bool "isRequest" template argument. These statements each declare a response: [...]
This makes me think that the two projects are more alike than you might think. The main difference here is just that Boost.Http's Message don't carry request-exclusive members (HTTP verb, uri) or response-exclusive members (status code, status messages) and methods that work on them take them separately. I'm open to follow a design more similar to yours (just need to discuss with Bjorn and other members first as I remember this decision was taken to follow a more fundamental message-based model). Another main difference is that Boost.Http server side design is really concerned about different HTTP backends. However, this is only observed in details and I really doubt they'll bother you. At most, I need to implement more convenient functions to remove the amount of boilerplate needed today (already on the TODO list and on the GitHub issue tracker). I believe the rest of the differences are really minor. Of course I might be wrong and I'll know once more information about the project is exposed (documentation maybe). "Some reviewers also felt that HTTP/2 should be part of Boost.Http,
partly because that would demonstrate the extensibility of the current design, and partly because the library would be in a stronger position to attract users if it offers more than its competitors."
The IETF adopted as a goal for HTTP/2, to leave the message the same (while changing its wire format). Therefore, Beast.HTTP's message model is already HTTP/2-friendly.
As for the extensibility of the design, free functions to send and receive HTTP messages are fundamentally incompatible with HTTP/2, which requires ongoing state to decompress headers and multiplex streams. And yet, we know that free functions to send and receive HTTP/1 messages are useful and sorely needed.
Furthermore HTTP/2 adds features that don't make sense for HTTP/1. What would you do with the stream ID? It is not part of the message model, because it describes a transport level property (and what meaning would it have for a HTTP/1 message? or someone who is using the message object but not using any sockets at all?) What about the interface to inform the peer of a new receive window size? What about setting stream priorities/weights? How do we sort all this out?
The interface for HTTP/1 should consist of a universal message model, plus functions to send and receive those messages on sockets. The interface for HTTP/2 should be a class template that wraps a socket (similar in style to beast::websocket::stream), deals in the same universal message model, and offers member functions to send and receive those messages with associated stream IDs as well as adjust HTTP/2-specific session parameters.
We respectfully disagree with those reviewers who feel that a single interface should serve the needs of sending and receiving both HTTP/1 and HTTP/2 messages.
But the two interfaces could be very similar. We also don't need force the user to mess with stream IDs. Server push is another story and indeed needs new APIs. -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/
participants (3)
-
Bjorn Reese
-
Vinnie Falco
-
Vinícius dos Santos Oliveira