Hi,
This is my formal review for Boost.HTTP library.
1. Should Boost.Http be accepted into Boost?
No. It has some big architectural issues and a lot of issues in code.
2. What is your evaluation of the design?
That's the fatal problem of the library. It seems like the library attempts
to get best from ASIO and cpp-netlib and unfortunately fails. Now let's get
into details:
* ASIO is a low level abstraction over protocol that does not take care
about memory management of buffers (in most cases it's the responsibility
of user to allocate and keep alive the required buffers).
* cpp-netlib takes care of memory management, hides low-level stuff.
Boost.Http is stuck somewhere in the middle: sometimes it allocates and
extends buffers, sometimes not. Because of that the design does not seem
solid. Any attempt to hide all the memory management inside the library
will end in an another implementation of cpp-netlib. So it seems right to
remove all the memory management from the library internals and make the
library closer to ASIO.
Here are my recommendations for separation of flies and dishes:
* Stick to the idea that http socket must ONLY manage connection without
extending the ASIO's send/receive function signatures. That means that
http1.0 and http1.1 sockets must be almost a typedef for tcp/ssl socket.
Http2.0 socket must be very close to SSL socket. No additional methods
(like async_read_trailers or async_write_response) must be provided.
* It must be a user's responsibility to provide buffers for messages. Never
extend buffers in socket-related methods.
* Provide helper classes that parse and generate http
* Provide out-of-the-box completion conditions for async_read operation:
http::completions::full, http::completions::headers,
http::completions::body...
Those bullets will allow to write following code (based on
http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/tutorial/tutdaytime...
with minimal modifications):
#include <ctime>
#include <iostream>
#include <string>
#include
On Wed, Aug 12, 2015 at 2:58 PM, Antony Polukhin
Hi,
This is my formal review for Boost.HTTP library.
1. Should Boost.Http be accepted into Boost?
No. It has some big architectural issues and a lot of issues in code.
2. What is your evaluation of the design?
That's the fatal problem of the library. It seems like the library attempts to get best from ASIO and cpp-netlib and unfortunately fails. Now let's get into details:
* ASIO is a low level abstraction over protocol that does not take care about memory management of buffers (in most cases it's the responsibility of user to allocate and keep alive the required buffers). * cpp-netlib takes care of memory management, hides low-level stuff.
Boost.Http is stuck somewhere in the middle: sometimes it allocates and extends buffers, sometimes not. Because of that the design does not seem solid. Any attempt to hide all the memory management inside the library will end in an another implementation of cpp-netlib. So it seems right to remove all the memory management from the library internals and make the library closer to ASIO.
This is one my complaints in my in-progress review. The documentation doesn't explicitly state how the provided SequenceContainer is being used during read/writes of the body. And SequenceContainer is a lax concept - std::list models SequenceContainer, which means all read/writes to the body require an internal copy to meet the concept correctly.
Here are my recommendations for separation of flies and dishes:
* Stick to the idea that http socket must ONLY manage connection without extending the ASIO's send/receive function signatures. That means that http1.0 and http1.1 sockets must be almost a typedef for tcp/ssl socket. Http2.0 socket must be very close to SSL socket. No additional methods (like async_read_trailers or async_write_response) must be provided.
* It must be a user's responsibility to provide buffers for messages. Never extend buffers in socket-related methods.
* Provide helper classes that parse and generate http
* Provide out-of-the-box completion conditions for async_read operation: http::completions::full, http::completions::headers, http::completions::body...
Those bullets will allow to write following code (based on
http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/tutorial/tutdaytime... with minimal modifications):
#include <ctime> #include <iostream> #include <string> #include
#include #include #include std::string make_daytime_string() { using namespace std; // For time_t, time and ctime; time_t now = time(0); return ctime(&now); }
class http_connection : public boost::enable_shared_from_this
{ public: typedef boost::shared_ptr pointer; static pointer create(boost::asio::io_service& io_service) { return pointer(new http_connection(io_service)); }
http::socket& socket() { return socket_; }
void start() { message_.resize(4 * 1024);
boost::asio::async_read(socket_, boost::asio::buffer(message_), http::completions::full(message_), // read until all the whole request is in `message_` boost::bind(&http_connection::handle_read, shared_from_this(), boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred)); }
private: http_connection(boost::asio::io_service& io_service) : socket_(io_service) {}
void handle_write(const boost::system::error_code& /*error*/, size_t /*bytes_transferred*/) {}
void handle_read(const boost::system::error_code& error, size_t bytes_transferred) { assert(!error); // for shortness // `message_` contains the whole request, including headers and body // because http::completions::full was provided
{ boost::system::error_code e; http::view v(message_, e); // represent buffer as http message assert(!e); // no error occured during parsing assert(v.read_state() == http::read_state::empty); std::cerr << v.version(); // "HTTP1.1" std::cerr << v["Content-type"]; // "plain/text" std::cerr << v.body(); // string_ref("Hello! This is a body content. What time is it?") message_.clear(); // `v` is now invalid to use! }
{ http::generator resp(message_); // use `message_` for an output resp << make_daytime_string(); // append daytime to body resp["Content-Type"] = "plain/text"; // set some headers resp.version(http::version::HTTP_1_1); // explicitly set HTTP version resp.code(200); // "200 OK" resp.flush(); // explicitly flush to buffer. Done automatically in destructor }
boost::asio::async_write(socket_, boost::asio::buffer(message_), boost::bind(&http_connection::handle_write, shared_from_this(), boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred)); }
http::socket socket_; // manages the connection only std::string message_; };
class http_server { public: http_server(boost::asio::io_service& io_service) : acceptor_(io_service, http::endpoint(http::all_versions, tcp::v4(), 80)) { start_accept(); }
private: void start_accept() { http_connection::pointer new_connection = http_connection::create(acceptor_.get_io_service());
acceptor_.async_accept(new_connection->socket(), boost::bind(&http_server::handle_accept, this, new_connection, boost::asio::placeholders::error)); }
void handle_accept(http_connection::pointer new_connection, const boost::system::error_code& error) { if (!error) { if (new_connection->socket().version() == http::version::HTTP_2_0) { // do some more stuff }
new_connection->start(); }
start_accept(); }
http::acceptor acceptor_; };
int main() { try { boost::asio::io_service io_service; http_server server(io_service); io_service.run(); } catch (std::exception& e) { std::cerr << e.what() << std::endl; }
return 0; }
This will also untie Boost.HTTP from ASIO. Now users could use http::generator/http::view to read/generate HTTP data. This significantly extends the use cases of your library (for example on a previous work we had our own network transfer implementation, but we'd gladly use a http parser and generator).
How is your http::stream version asynchronous? It appears it would be
blocking, unless there is some other magic occurring. It would be more concerning on the reads (the generation could be pre-computed), because if the underlying stream didn't have the data for .version(), etc., I think it would need to block. Or it would have to read the entire header into one larger buffer then parse? I guess you could use `async_read_until(..., , ...) functionality and keep buffering (until a max of course), then throw that to the parser? Anyway - I was thinking along the same lines at various points. Having a function that pre-generates HTTP messages is very useful IMO, and should likely be included. Designing a good parsing concept would be a bit more work I think, but probably worth it too. I'm not sure how the author intends to swap out parsers in the current design. Having a fixed parser seems acceptable, but the author almost seemed to suggest that it could be selectable somehow.
Such code is also less intrusive: only minimal chages were required to change a tcp example into the http example. Users will appreciate that, especially in cases when ASIO is already used for HTTP.
BTW, after that you could easily implement http::stream (that must be close to tcp::stream by functionality). http::stream would be a killer feature that would be widely used by utility programs (as I remember current Boost's regression testing requires it, but uses tcp::stream with a bunch of workarounds).
Please, implement the HTTP2.0 protocol. There are some open source solutions that are licensed under the Boost license and are very close to full implementation of HTTP2.0. Just reuse the code and chat with the authors for better understanding of the problems.
3. What is your evaluation of the implementation?
Not perfect.
For example
https://github.com/BoostGSoC14/boost.http/blob/master/include/boost/http/soc... It seems that `std::vectorasio::const_buffer buffers; buffers.reserve(nbuffer_pieces);` is meant to be here. There are more such places...
It seems that some useful features (like custom allocators) are not supported by the library. It's better to be fixed.
Some of the implementation issues are because of the library design. For example requesting multiple async operations (async_read_request+async_read_some+async_read_trailers) is performance hit. It's better to read all the data at once (just like in the example from above).
How much of a penalty do you think there is to this split design? It seems likely that multiple handlers will have to be given to read_some (directly or indirectly) for ASIO to read the header, etc. anyway.
4. What is your evaluation of the documentation?
First code snippet in tutorial contains mostly ASIO's stuff and almost no HTTP stuff. That's wired... If bullets from above would be implemented, then I'd rather start with parsing and generating simple http messages. Then an example with sockets could be provided.
Also please do not use coroutines at least in first example. coroutines is a new feature in ASIO, many users use old Boost versions that have no coroutines and some users are not familiar with them. So if you start with coroutines user will have to go to ASIO docs and dig into coroutines to understand the very first example. Example without coroutines will be more user friendly.
5. What is your evaluation of the potential usefulness of the library?
Library is very useful and required. But in current state it must not be accepted into Boost.
6. Did you try to use the library? With what compiler? Did you have any problems?
Have not tryed.
7. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
A few hours digging into docs, about 3 hours looking into the sources, about 3 hours formulising the issues and recommending solutions.
8. Are you knowledgeable about the problem domain?
Not an expert, but used ASIO a lot along with cpp-netlib and some other libraries close to the proposed one.
8.3333(3) Review
Please, reserve a bigger window when such complex library is proposed for review. It's hard to review it in 10 days.
8.6666(6) For Reviewers
I urge other reviewers to take a look at the cpp-netlib. If you treat Boost.Http as an out-of-box solution, then cpp-netlib is more mature, user-friendly and solid. If you treat Boost.Http as a basic building block for HTTP (just like ASIO is a basic building block for networking), then it's better to stick to the ASIO's design and do not *partially* manage memory!
P.S.: I hope that this library would be changed according to my recommendations and submitted for a new review.
I agree. More explicit memory management would differentiate this from
existing libraries, otherwise boost should likely just review one of the other major libraries. Lee
2015-08-12 19:27 GMT-03:00 Lee Clagett
Anyway - I was thinking along the same lines at various points. Having a function that pre-generates HTTP messages is very useful IMO, and should likely be included. Designing a good parsing concept would be a bit more work I think, but probably worth it too. I'm not sure how the author intends to swap out parsers in the current design. Having a fixed parser seems acceptable, but the author almost seemed to suggest that it could be selectable somehow.
A parser doesn't make sense for all communication channels. Currently there is one communication channel: basic_socket<T> basic_socket<T> is meant for embedded servers (will be extender for lean HTTP client connections in the future) and is tied to the HTTP wire format, so it uses an HTTP parser. I don't expose the parser because I use a C parser that is limited to the C (lack of) expressiveness. When I replace this parser by a better one, I'll change the situation. I don't plan to allow selectable parsers for basic_socket, but it'd be useful to select parser options (like strict parser, liberal parsers, max header size and so on...). -- Vinícius dos Santos Oliveira https://about.me/vinipsmaker
2015-08-12 19:27 GMT-03:00 Lee Clagett
: Anyway - I was thinking along the same lines at various points. Having a function that pre-generates HTTP messages is very useful IMO, and should likely be included. Designing a good parsing concept would be a bit more work I think, but probably worth it too. I'm not sure how the author intends to swap out parsers in the current design. Having a fixed parser seems acceptable, but the author almost seemed to suggest that it could be selectable somehow.
A parser doesn't make sense for all communication channels.
Do you have an example of a communication channel where a parser concept wouldn't work? They wouldn't necessarily always provide the same output or behave the same way, but a communication channel has a defined format. Any implementation reading that format generally has _some_ output, which is
On Thu, Aug 13, 2015 at 11:43 AM, Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com> wrote: pretty much a parser IMO. Sorry for the bikeshedding on this, its not really necessary, but this stuck out for some reason.
Currently there is one communication channel: basic_socket<T>
basic_socket<T> is meant for embedded servers (will be extender for lean HTTP client connections in the future) and is tied to the HTTP wire format,
so it uses an HTTP parser. I don't expose the parser because I use a C
parser that is limited to the C (lack of) expressiveness. When I replace
Its possible to expose the C parser through a C++ interface, so I don't see this as a valid argument.
this parser by a better one, I'll change the situation. I don't plan to allow selectable parsers for basic_socket, but it'd be useful to select parser options (like strict parser, liberal parsers, max header size and so on...).
My apologies in the last post, I realized you just answered my question here. Your plan is no parser concepts; parsers are tied directly to a http::Socket implementation. Lee
2015-08-13 22:29 GMT-03:00 Lee Clagett
On Thu, Aug 13, 2015 at 11:43 AM, Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com> wrote:
2015-08-12 19:27 GMT-03:00 Lee Clagett
: Anyway - I was thinking along the same lines at various points. Having a function that pre-generates HTTP messages is very useful IMO, and should likely be included. Designing a good parsing concept would be a bit more work I think, but probably worth it too. I'm not sure how the author intends to swap out parsers in the current design. Having a fixed parser seems acceptable, but the author almost seemed to suggest that it could be selectable somehow.
A parser doesn't make sense for all communication channels.
Do you have an example of a communication channel where a parser concept wouldn't work? They wouldn't necessarily always provide the same output or behave the same way, but a communication channel has a defined format. Any implementation reading that format generally has _some_ output, which is pretty much a parser IMO. Sorry for the bikeshedding on this, its not really necessary, but this stuck out for some reason.
Well, in my previous answer, I think I ended focusing on the wrong part of the proposal. Let me fix this issue in this email. And thank you for helping me figuring out my mistake (or "keep insisting" or "not losing hope on me", what you prefer). To handle a HTTP request, you read the metadata, progressively download the body and then the trailers. It's wise to avoid reading partial metadata because the request can only be handled after the whole metadata has been read. However, the body can be handled as it is received. You're arguing that you always (1) fill a buffer and then (2) parse it. CGI uses environment variables, not a contiguous chunk of memory or stream of bytes. It still could work, though. The headers would be serialized into the buffer (not nice). Using the buffer/view approach, messages are always in serialized format. Unless you store/cache the parsing result (doing allocation or some fixed size, as you do not know amounts ahead-of-time), this will consume more time to handle, as you'll need to reparse every time an information is asked ("give me header host", "give me header cookie"). If you do store parsing result, you're just storing the message using a masked-as-not-message-based-when-it's-not API. And unless your buffer is used to store more info than the real network traffic, the view needs to be get information from the socket too, not just the buffer. It's not a pure parser, there is state not found on the buffer (imagine how you'd handle progressive download where lots of the traffic was already discarded to handle the rest of the message), so the view needs the socket, which is storing information not present in the buffer. It's like wasting much more CPU usage to avoid some more memory usage. These are just the basic changes of impact. On a high-level side of thinking, buffer/view approach makes all requests immutable by default. You cannot fake or inject data in the headers while you pass the headers along a chain of handlers. There are workarounds. Also, if you cannot forge the HTTP message, you cannot create it and send to the socket. You always need to use the generator. The problem with the proposed generator is the lack of consideration for other HTTP backends. More care need to be given to capabilities like happens in Boost.Http (is chunking available? 100-continue? can I upgrade? ...?). Also, like I've stated in one of the previous emails, it's tricky to get interoperability right with these not explicit generators (is chunking going to be implicitly used?). I need to think more, maybe I'll send another email with more comments. You can have these meanwhile.
Currently there is one communication channel: basic_socket<T>
basic_socket<T> is meant for embedded servers (will be extender for lean HTTP client connections in the future) and is tied to the HTTP wire
format,
so it uses an HTTP parser. I don't expose the parser because I use a C
parser that is limited to the C (lack of) expressiveness. When I replace
Its possible to expose the C parser through a C++ interface, so I don't see this as a valid argument.
Okay. The C parser I use has a terrible abstraction and I don't want to Boostify this abstraction. I want to define a better interface, a new interface, not a Boostification of the C interface I internally use. The "bad" interface has a great excuse to be the way it is: it's one of the best things you can have trying to abstracting code using the C language. -- Vinícius dos Santos Oliveira https://about.me/vinipsmaker
On Fri, Aug 14, 2015 at 1:06 AM, Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com> wrote:
2015-08-13 22:29 GMT-03:00 Lee Clagett
: On Thu, Aug 13, 2015 at 11:43 AM, Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com> wrote:
2015-08-12 19:27 GMT-03:00 Lee Clagett
: Anyway - I was thinking along the same lines at various points. Having a function that pre-generates HTTP messages is very useful IMO, and should likely be included. Designing a good parsing concept would be a bit more work I think, but probably worth it too. I'm not sure how the author intends to swap out parsers in the current design. Having a fixed parser seems acceptable, but the author almost seemed to suggest that it could be selectable somehow.
A parser doesn't make sense for all communication channels.
Do you have an example of a communication channel where a parser concept wouldn't work? They wouldn't necessarily always provide the same output or behave the same way, but a communication channel has a defined format. Any implementation reading that format generally has _some_ output, which is pretty much a parser IMO. Sorry for the bikeshedding on this, its not really necessary, but this stuck out for some reason.
Well, in my previous answer, I think I ended focusing on the wrong part of the proposal. Let me fix this issue in this email. And thank you for helping me figuring out my mistake (or "keep insisting" or "not losing hope on me", what you prefer).
To handle a HTTP request, you read the metadata, progressively download the body and then the trailers. It's wise to avoid reading partial metadata because the request can only be handled after the whole metadata has been read. However, the body can be handled as it is received.
You're arguing that you always (1) fill a buffer and then (2) parse it. CGI uses environment variables, not a contiguous chunk of memory or stream of bytes. It still could work, though. The headers would be serialized into the buffer (not nice).
Using the buffer/view approach, messages are always in serialized format. Unless you store/cache the parsing result (doing allocation or some fixed size, as you do not know amounts ahead-of-time), this will consume more time to handle, as you'll need to reparse every time an information is asked ("give me header host", "give me header cookie"). If you do store parsing result, you're just storing the message using a masked-as-not-message-based-when-it's-not API. And unless your buffer is used to store more info than the real network traffic, the view needs to be get information from the socket too, not just the buffer. It's not a pure parser, there is state not found on the buffer (imagine how you'd handle progressive download where lots of the traffic was already discarded to handle the rest of the message), so the view needs the socket, which is storing information not present in the buffer. It's like wasting much more CPU usage to avoid some more memory usage. These are just the basic changes of impact.
On a high-level side of thinking, buffer/view approach makes all requests immutable by default. You cannot fake or inject data in the headers while you pass the headers along a chain of handlers. There are workarounds. Also, if you cannot forge the HTTP message, you cannot create it and send to the socket. You always need to use the generator. The problem with the proposed generator is the lack of consideration for other HTTP backends. More care need to be given to capabilities like happens in Boost.Http (is chunking available? 100-continue? can I upgrade? ...?). Also, like I've stated in one of the previous emails, it's tricky to get interoperability right with these not explicit generators (is chunking going to be implicitly used?).
I need to think more, maybe I'll send another email with more comments. You can have these meanwhile.
Sorry I wasn't being more clear on this - I am not advocating the
buffer/view approach mentioned in the OP for HTTP, I just happened to also
be thinking about protocol/messages/parsers. I thought your initial
response indicated that his idea wouldn't work (it should), not that it was
inefficient. The buffer/view approach provided above should require a
double-parse of the HTTP header since HTTP does not define a length field
at a fixed offset from the start of the message. Protocols where the length
field is at a fixed offset, which indicates the entire size of the message,
are better suited for that style of message handling since the first pass
is pretty low CPU.
I _was_ thinking about parsers that provided different artifacts. For
instance, a parser could be a Function Object with
"expected
2015-08-14 2:55 GMT-03:00 Lee Clagett
Sorry I wasn't being more clear on this - I am not advocating the buffer/view approach mentioned in the OP for HTTP, I just happened to also be thinking about protocol/messages/parsers. I thought your initial response indicated that his idea wouldn't work (it should), not that it was inefficient.
My initial response was against the complete lack of consideration to alternative HTTP backends. I was so focused on this disadvantage that I end up ignoring the socket+buffer+parser/view suggestion. In the end, it looks like OP just wants an HTTP parser. I don't even expose an HTTP parser. It'll be done in the **future**. This library is **not** an HTTP parser. You should **not** judge this library thinking it is an HTTP parser or comparing it to HTTP parsers. However, it is still interesting to compare to HTTP parsers (know I'm sure I'm looking kind of contradictory). Let me explain further: This discussion made considerations of features, limitations and use cases that showed useful to define a set of traits that can be used to make Boost.Http more useful. However, if you want an HTTP parser, you do not even care about alternative HTTP backends and this is one of the major points on Boost.Http. I won't accept features that will hamper development of alternative HTTP backends. I will, however, try to convert your suggestion to not be so detrimental, if you are willing to discuss. The buffer/view approach provided above should require a
double-parse of the HTTP header since HTTP does not define a length field at a fixed offset from the start of the message. Protocols where the length field is at a fixed offset, which indicates the entire size of the message, are better suited for that style of message handling since the first pass is pretty low CPU.
Or the parsing result could be stored into the socket object, turning the API in just a convoluted way to achieve what the message-oriented approach already provides. OTOH, you need some tricks to avoid allocations in the message-oriented approach. -- Vinícius dos Santos Oliveira https://about.me/vinipsmaker
2015-08-14 2:06 GMT-03:00 Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com>:
2015-08-13 22:29 GMT-03:00 Lee Clagett
: On Thu, Aug 13, 2015 at 11:43 AM, Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com> wrote:
2015-08-12 19:27 GMT-03:00 Lee Clagett
: Anyway - I was thinking along the same lines at various points. Having a function that pre-generates HTTP messages is very useful IMO, and should likely be included. Designing a good parsing concept would be a bit more work I think, but probably worth it too. I'm not sure how the author intends to swap out parsers in the current design. Having a fixed parser seems acceptable, but the author almost seemed to suggest that it could be selectable somehow.
A parser doesn't make sense for all communication channels.
Do you have an example of a communication channel where a parser concept wouldn't work? They wouldn't necessarily always provide the same output or behave the same way, but a communication channel has a defined format. Any implementation reading that format generally has _some_ output, which is pretty much a parser IMO. Sorry for the bikeshedding on this, its not really necessary, but this stuck out for some reason.
Well, in my previous answer, I think I ended focusing on the wrong part of the proposal. Let me fix this issue in this email. And thank you for helping me figuring out my mistake (or "keep insisting" or "not losing hope on me", what you prefer).
To handle a HTTP request, you read the metadata, progressively download the body and then the trailers. It's wise to avoid reading partial metadata because the request can only be handled after the whole metadata has been read. However, the body can be handled as it is received.
You're arguing that you always (1) fill a buffer and then (2) parse it. CGI uses environment variables, not a contiguous chunk of memory or stream of bytes. It still could work, though. The headers would be serialized into the buffer (not nice).
Using the buffer/view approach, messages are always in serialized format. Unless you store/cache the parsing result (doing allocation or some fixed size, as you do not know amounts ahead-of-time), this will consume more time to handle, as you'll need to reparse every time an information is asked ("give me header host", "give me header cookie"). If you do store parsing result, you're just storing the message using a masked-as-not-message-based-when-it's-not API. And unless your buffer is used to store more info than the real network traffic, the view needs to be get information from the socket too, not just the buffer. It's not a pure parser, there is state not found on the buffer (imagine how you'd handle progressive download where lots of the traffic was already discarded to handle the rest of the message), so the view needs the socket, which is storing information not present in the buffer. It's like wasting much more CPU usage to avoid some more memory usage. These are just the basic changes of impact.
On a high-level side of thinking, buffer/view approach makes all requests immutable by default. You cannot fake or inject data in the headers while you pass the headers along a chain of handlers. There are workarounds. Also, if you cannot forge the HTTP message, you cannot create it and send to the socket. You always need to use the generator. The problem with the proposed generator is the lack of consideration for other HTTP backends. More care need to be given to capabilities like happens in Boost.Http (is chunking available? 100-continue? can I upgrade? ...?). Also, like I've stated in one of the previous emails, it's tricky to get interoperability right with these not explicit generators (is chunking going to be implicitly used?).
I need to think more, maybe I'll send another email with more comments. You can have these meanwhile.
Okay, I though enough about the proposal. I was thinking how I could write an implementation for such proposal. The problem is that you'd have to keep the headers (the whole metadata actually) in the buffer as long as the current request-response pair isn't finished. If you use a fixed-size buffer, you may never be able to finish the request, even if you progressively download the body. And if the view/parser only reads the buffer, progressive download is already kind of hard and may very well involve a complex API. If the view also takes the socket (not only the buffer) as argument, this may be partially addressed. With this approach, the message will have to be reparsed so many times that you may as well just expose an HTTP parser to the user and don't try to provide anything else. The parser approach is too low level. And this approach to turn it into a high level API kills a lot of HTTP features/use cases like progressively downloading the body, a least with this low amount of investment. Also, you mixed two proposals (actually a little more) into one: - A design closer to tcp::socket. - The socket+buffer+parser/view approach. And I may want to criticize a point that is a consequence of only one proposal, but with the mixed solution, I end up criticizing both. I'll trust your judgement to correctly separate which critique goes to which proposal. -- Vinícius dos Santos Oliveira https://about.me/vinipsmaker
2015-08-12 15:58 GMT-03:00 Antony Polukhin
That's the fatal problem of the library. It seems like the library attempts to get best from ASIO and cpp-netlib and unfortunately fails. Now let's get into details:
* ASIO is a low level abstraction over protocol that does not take care about memory management of buffers (in most cases it's the responsibility of user to allocate and keep alive the required buffers). * cpp-netlib takes care of memory management, hides low-level stuff.
I'm actually competing with cpp-netlib, but cpp-netlib has more abstractions. I finished a strong core abstraction and submitted it for review. This strong core can already be used to build some high level abstractions (or applications). There is a file server making use of all this low-level abstraction and there are lots of type erasers to allow the use of this library into plugins, so you can change the handlers at runtime without restarting the server. I just expect to provide more algorithms and abstractions on top of this strong core and anything that is high-level will use the same main players from this core proposal (http messages and sockets). And the generic algorithms that are provided (e.g. file server) will work on any alternative high-level API or HTTP backend that is added in the future. Boost.Http is stuck somewhere in the middle: sometimes it allocates and
extends buffers, sometimes not. Because of that the design does not seem solid. Any attempt to hide all the memory management inside the library will end in an another implementation of cpp-netlib. So it seems right to remove all the memory management from the library internals and make the library closer to ASIO.
I don't think it's fair to suggest that any attempt I make to improve the high-level side of the library will end up in another way to implement the cpp-netlib API. The cpp-netlib API has several flaws and I avoid them all. The advantage of cpp-netlib is that cpp-netlib has more abstractions. Here are my recommendations for separation of flies and dishes:
* Stick to the idea that http socket must ONLY manage connection without extending the ASIO's send/receive function signatures. That means that http1.0 and http1.1 sockets must be almost a typedef for tcp/ssl socket. Http2.0 socket must be very close to SSL socket. No additional methods (like async_read_trailers or async_write_response) must be provided.
TCP is stream-oriented, UDP is datagram-oriented and HTTP is message-oriented (request-reply actually). HTTP messages are structured and should remain type-safe. Your application may be using an HTTP backend that doesn't even do network (shared memory protocol for load-balance high-load of HTTP traffic) and this library will still be useful. This library is sticking to HTTP concepts, not some hacky adaption of tcp/udp sockets. I don't even provide a parser (I made sure it's completely hidden from user). Additional methods like async_read_trailers and async_write_response is one of the main reasons why it's possible to abstract HTTP communication channels. You hide communication-specific details into communication channel code. If you don't want memory exhaustion trying to read a possibly infinite message body that is **very** easy to craft you should start to thinking about not making so many things implicit/magic. From the http::view proposal you outline below, I assume you are considering parsing/destructuring the message at a different point in time, which is just changing the abstraction from one point to another and providing no benefit over current design at all. Actually, your proposal seem to be highly coupled to HTTP wire format, which is terrible for different backends and may become the cause of major inefficiencies. * It must be a user's responsibility to provide buffers for messages. Never
extend buffers in socket-related methods.
Implementation is improving: - https://github.com/BoostGSoC14/boost.http/blob/0fc8dd7a594bb5ebb676d2d55621a... - Place for improvement: https://github.com/BoostGSoC14/boost.http/blob/0fc8dd7a594bb5ebb676d2d55621a... message.insert(make_pair("host", "www.example.com")) kind of allocate/extend buffers, but the message type is completely controlled by the user and I even was careful to make sure that a single block of memory could be used to the whole message (headers, body and trailers). * Provide helper classes that parse and generate http
That's actually a good idea. Akin to Asio's async_write, I assume. It wouldn't change any already provided API, though. * Provide out-of-the-box completion conditions for async_read operation:
http::completions::full, http::completions::headers, http::completions::body...
Okay. Those bullets will allow to write following code (based on
http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/tutorial/tutdaytime... with minimal modifications):
#include <ctime> #include <iostream> #include <string> #include
#include #include #include std::string make_daytime_string() { using namespace std; // For time_t, time and ctime; time_t now = time(0); return ctime(&now); }
class http_connection : public boost::enable_shared_from_this
{ public: typedef boost::shared_ptr pointer; static pointer create(boost::asio::io_service& io_service) { return pointer(new http_connection(io_service)); }
http::socket& socket() { return socket_; }
void start() { message_.resize(4 * 1024);
boost::asio::async_read(socket_, boost::asio::buffer(message_), http::completions::full(message_), // read until all the whole request is in `message_` boost::bind(&http_connection::handle_read, shared_from_this(), boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred)); }
private: http_connection(boost::asio::io_service& io_service) : socket_(io_service) {}
void handle_write(const boost::system::error_code& /*error*/, size_t /*bytes_transferred*/) {}
void handle_read(const boost::system::error_code& error, size_t bytes_transferred) { assert(!error); // for shortness // `message_` contains the whole request, including headers and body // because http::completions::full was provided
{ boost::system::error_code e; http::view v(message_, e); // represent buffer as http message assert(!e); // no error occured during parsing assert(v.read_state() == http::read_state::empty); std::cerr << v.version(); // "HTTP1.1" std::cerr << v["Content-type"]; // "plain/text" std::cerr << v.body(); // string_ref("Hello! This is a body content. What time is it?") message_.clear(); // `v` is now invalid to use! }
{ http::generator resp(message_); // use `message_` for an output resp << make_daytime_string(); // append daytime to body resp["Content-Type"] = "plain/text"; // set some headers resp.version(http::version::HTTP_1_1); // explicitly set HTTP version resp.code(200); // "200 OK" resp.flush(); // explicitly flush to buffer. Done automatically in destructor }
boost::asio::async_write(socket_, boost::asio::buffer(message_), boost::bind(&http_connection::handle_write, shared_from_this(), boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred)); }
http::socket socket_; // manages the connection only std::string message_; };
class http_server { public: http_server(boost::asio::io_service& io_service) : acceptor_(io_service, http::endpoint(http::all_versions, tcp::v4(), 80)) { start_accept(); }
private: void start_accept() { http_connection::pointer new_connection = http_connection::create(acceptor_.get_io_service());
acceptor_.async_accept(new_connection->socket(), boost::bind(&http_server::handle_accept, this, new_connection, boost::asio::placeholders::error)); }
void handle_accept(http_connection::pointer new_connection, const boost::system::error_code& error) { if (!error) { if (new_connection->socket().version() == http::version::HTTP_2_0) { // do some more stuff }
new_connection->start(); }
start_accept(); }
http::acceptor acceptor_; };
int main() { try { boost::asio::io_service io_service; http_server server(io_service); io_service.run(); } catch (std::exception& e) { std::cerr << e.what() << std::endl; }
return 0; }
This idea is horrible. An HTTP message is not a std::string. HTTP does have a string representation, the HTTP wire format from HTTP 1.1, which is different than HTTP 2.0, which is different than FastCGI, which would be different than a ZeroMQ-based approach and so on and so on. I'm not proposing an HTTP parser, I'm proposing an HTTP server library. This library will grow and also include a client. The HTTP parser is the least useful thing this library could provide. I'm aiming a library that is so useful that you'd use it no matter if you're going to write an application to be used on Raspberry Pi (or even more constrained) or on a cluster of nodes that distribute workload among thousands and thousands of nodes and use different intermediate protocols. You even assume HTTP version is present. It may not even make sense and it shouldn't be exposed to the user anyway. The user is interested in features (can I use chunking? can I use X?). It's more portable. You'd suggest a different approach if you had invested time to propose a library that could be used with different backends. About your http::view proposal: It's not much different from http::Message concept, but: - Your http::view read headers and read_state. This means you're mixing/coupling communication channels and HTTP messages. Bad for different backends. It's bad because you then need to answer questions such as "how do I pass a pool, custom alocator and this and that to my backend?". - Also, how does your proposal handle progressive download and chunking? v.body() doesn't seem very/any asynchronous at all. The flush method in response is useful for chunking, but the lack of information around the capability of chunking by the communication channel leaves no option but to buffer implicitly, which can exhaust memory in video live stream. And this magic node-like API can very easily hamper interoperability among different implementations (I've already went into this road[1]. About the no-buffering approach, I just cannot give this guarantee. Any object fulfilling the ServerSocket concept will have its own guarantees and it is up to the user to check them. At the handling level, you can use custom types fulfilling the message concept that will avoid memory allocation. This possibility is not accidental. Just like you're worried, I'm worried too. This will also untie Boost.HTTP from ASIO. Now users could use
http::generator/http::view to read/generate HTTP data. This significantly extends the use cases of your library (for example on a previous work we had our own network transfer implementation, but we'd gladly use a http parser and generator).
The library concepts are already untied from Asio network code. But the library remains tied to Asio's concepts for asynchronous coding. The library is used to interact with external parties, so I use an asynchronous API to provide scalability. I need some framework for asynchronous foundations and I use Asio. I'm choosing to reuse a Boost component to submit a Boost library. But I'm not using Asio just because it's already into Boost. Asio is flexible enough to allow me to use any architecture I want (single-threaded, one thread per operation, multiple operations per thread and one thread per CPU core...) using any of multiple styles (completion handlers, futures, coroutines, blocking...) and much more. I won't provide this foundation myself into Boost.Http as Asio already does it for me. Such code is also less intrusive: only minimal chages were required to
change a tcp example into the http example. Users will appreciate that, especially in cases when ASIO is already used for HTTP.
I do appreciate the "less intrusive" guideline. But I dislike the attempt to make http::socket as close to tcp::socket (or any other socket) as possible. It'll just remove features and you're ignoring that some users want alternative HTTP backends. Your proposal is **less** useful because it can be used in **less** applications. You need to elaborate more if you want to convince that this approach will make **more** users happy. BTW, after that you could easily implement http::stream (that must be close
to tcp::stream by functionality). http::stream would be a killer feature that would be widely used by utility programs (as I remember current Boost's regression testing requires it, but uses tcp::stream with a bunch of workarounds).
An http::stream would be an adapter, not decrease the current API quality, so I'm less reluctant against this point. Anyway, the more I read your review, the more I think you're completely ignoring that network code might not even be involved in passing HTTP messages around. This would explain why you could prefer cpp-netlib, but it is just less useful. Please, implement the HTTP2.0 protocol. There are some open source
solutions that are licensed under the Boost license and are very close to full implementation of HTTP2.0. Just reuse the code and chat with the authors for better understanding of the problems.
I'll try to provide a prototype tonight, but the intent of the prototype is just to prove that this core API is okay. Also please do not use coroutines at least in first example. coroutines is
a new feature in ASIO, many users use old Boost versions that have no coroutines and some users are not familiar with them. So if you start with coroutines user will have to go to ASIO docs and dig into coroutines to understand the very first example. Example without coroutines will be more user friendly.
This is the kind of argument... by the time Boost.Http reaches Boost, Asio users will already be used to coroutines. I don't even depend on the latest Boost release if I'm not mistaken. 7. How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?
A few hours digging into docs, about 3 hours looking into the sources, about 3 hours formulising the issues and recommending solutions.
Please read the design choices chapter. You seem to be completely ignoring all use cases where a standalone server (e.g. FastCGI, ZeroMQ...) isn't used. I urge other reviewers to take a look at the cpp-netlib. If you treat
Boost.Http as an out-of-box solution, then cpp-netlib is more mature, user-friendly and solid. If you treat Boost.Http as a basic building block for HTTP (just like ASIO is a basic building block for networking), then it's better to stick to the ASIO's design and do not *partially* manage memory!
I'm puzzled by the meaning of solid. If it's something old, then Boost.Http will never beat cpp-netlib. Anyway, thanks for such time-devoted review, Antony. =) [1] https://github.com/vinipsmaker/tufao/issues/41 -- Vinícius dos Santos Oliveira https://about.me/vinipsmaker
On Thu, Aug 13, 2015 at 11:38 AM, Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com> wrote:
2015-08-12 15:58 GMT-03:00 Antony Polukhin
: That's the fatal problem of the library. It seems like the library attempts to get best from ASIO and cpp-netlib and unfortunately fails. Now let's get into details:
* ASIO is a low level abstraction over protocol that does not take care about memory management of buffers (in most cases it's the responsibility of user to allocate and keep alive the required buffers). * cpp-netlib takes care of memory management, hides low-level stuff.
[snip]
Those bullets will allow to write following code (based on
with minimal modifications):
#include <ctime> #include <iostream> #include <string> #include
#include #include #include std::string make_daytime_string() { using namespace std; // For time_t, time and ctime; time_t now = time(0); return ctime(&now); }
class http_connection : public boost::enable_shared_from_this
{ public: typedef boost::shared_ptr pointer; static pointer create(boost::asio::io_service& io_service) { return pointer(new http_connection(io_service)); }
http::socket& socket() { return socket_; }
void start() { message_.resize(4 * 1024);
boost::asio::async_read(socket_, boost::asio::buffer(message_), http::completions::full(message_), // read until all the whole request is in `message_` boost::bind(&http_connection::handle_read, shared_from_this(), boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred)); }
private: http_connection(boost::asio::io_service& io_service) : socket_(io_service) {}
void handle_write(const boost::system::error_code& /*error*/, size_t /*bytes_transferred*/) {}
void handle_read(const boost::system::error_code& error, size_t bytes_transferred) { assert(!error); // for shortness // `message_` contains the whole request, including headers and body // because http::completions::full was provided
{ boost::system::error_code e; http::view v(message_, e); // represent buffer as http message assert(!e); // no error occured during parsing assert(v.read_state() == http::read_state::empty); std::cerr << v.version(); // "HTTP1.1" std::cerr << v["Content-type"]; // "plain/text" std::cerr << v.body(); // string_ref("Hello! This is a
http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/tutorial/tutdaytime... body
content. What time is it?") message_.clear(); // `v` is now invalid to use! }
{ http::generator resp(message_); // use `message_` for an output resp << make_daytime_string(); // append daytime to body resp["Content-Type"] = "plain/text"; // set some headers resp.version(http::version::HTTP_1_1); // explicitly set HTTP version resp.code(200); // "200 OK" resp.flush(); // explicitly flush to buffer. Done automatically in destructor }
boost::asio::async_write(socket_, boost::asio::buffer(message_), boost::bind(&http_connection::handle_write, shared_from_this(), boost::asio::placeholders::error, boost::asio::placeholders::bytes_transferred)); }
http::socket socket_; // manages the connection only std::string message_; };
class http_server { public: http_server(boost::asio::io_service& io_service) : acceptor_(io_service, http::endpoint(http::all_versions, tcp::v4(), 80)) { start_accept(); }
private: void start_accept() { http_connection::pointer new_connection = http_connection::create(acceptor_.get_io_service());
acceptor_.async_accept(new_connection->socket(), boost::bind(&http_server::handle_accept, this, new_connection, boost::asio::placeholders::error)); }
void handle_accept(http_connection::pointer new_connection, const boost::system::error_code& error) { if (!error) { if (new_connection->socket().version() == http::version::HTTP_2_0) { // do some more stuff }
new_connection->start(); }
start_accept(); }
http::acceptor acceptor_; };
int main() { try { boost::asio::io_service io_service; http_server server(io_service); io_service.run(); } catch (std::exception& e) { std::cerr << e.what() << std::endl; }
return 0; }
This idea is horrible. An HTTP message is not a std::string. HTTP does have a string representation, the HTTP wire format from HTTP 1.1, which is different than HTTP 2.0, which is different than FastCGI, which would be different than a ZeroMQ-based approach and so on and so on.
Why would this idea be limited to strings? The generator/parser could easily take another type. And std::strings can store binary data network data, if thats what your suggesting (it wasn't exactly clear to me). I'm not proposing an HTTP parser, I'm proposing an HTTP server library.
This library will grow and also include a client. The HTTP parser is the least useful thing this library could provide. I'm aiming a library that is so useful that you'd use it no matter if you're going to write an application to be used on Raspberry Pi (or even more constrained) or on a cluster of nodes that distribute workload among thousands and thousands of nodes and use different intermediate protocols.
Definining a HTTP parser (specifically a SAX-style parser) is likely the only way to achieve zero allocations, which is exactly what the parser in use by this library is doing. Providing C++ templated SAX parser _would_ be novel - it would allow for inlined/direct callbacks as opposed to the indirect callbacks provided by the C parser in use. Probably a small reduction in CPU cycles though, since parsing time should dominate. Providing such a library would likely change the current design dramatically.
You even assume HTTP version is present. It may not even make sense and it shouldn't be exposed to the user anyway. The user is interested in features (can I use chunking? can I use X?). It's more portable. You'd suggest a different approach if you had invested time to propose a library that could be used with different backends.
About your http::view proposal: It's not much different from http::Message concept, but:
- Your http::view read headers and read_state. This means you're mixing/coupling communication channels and HTTP messages. Bad for different backends. It's bad because you then need to answer questions such as "how do I pass a pool, custom alocator and this and that to my backend?". - Also, how does your proposal handle progressive download and chunking? v.body() doesn't seem very/any asynchronous at all. The flush method in response is useful for chunking, but the lack of information around the capability of chunking by the communication channel leaves no option but to buffer implicitly, which can exhaust memory in video live stream. And this magic node-like API can very easily hamper interoperability among different implementations (I've already went into this road[1].
About the no-buffering approach, I just cannot give this guarantee. Any object fulfilling the ServerSocket concept will have its own guarantees and it is up to the user to check them. At the handling level, you can use custom types fulfilling the message concept that will avoid memory allocation. This possibility is not accidental. Just like you're worried, I'm worried too.
How could a user provide a message concept that satifies the
SequenceContainer for the body avoid memory allocations? The documentation
for reading the body doesn't state how the SequenceContainer is being used
or modified. Can I set a size for the container, that the http::Socket
concept never exceeds while reading (effectively making it a fixed buffer
like ASIO)? I've read this several times (and maybe I'm being thick), but
if I were writing an implementation of this concept, it would seem that its
implementation defined how the SequenceContainer is being manipulated.
Currently the body is being automatically resized on reads, with new data
being copied in. I _think_ the only way to control how much is being copied
in at a time (and thus the max the SequenceContainer is being resized), is
through the buffer argument to basic_socket, but this is undocumented.
And to satifies the message requirements for the header/trailers requires
an AssociateMap
This will also untie Boost.HTTP from ASIO. Now users could use
http::generator/http::view to read/generate HTTP data. This significantly extends the use cases of your library (for example on a previous work we had our own network transfer implementation, but we'd gladly use a http parser and generator).
The library concepts are already untied from Asio network code. But the library remains tied to Asio's concepts for asynchronous coding. The library is used to interact with external parties, so I use an asynchronous API to provide scalability. I need some framework for asynchronous foundations and I use Asio. I'm choosing to reuse a Boost component to submit a Boost library. But I'm not using Asio just because it's already into Boost. Asio is flexible enough to allow me to use any architecture I want (single-threaded, one thread per operation, multiple operations per thread and one thread per CPU core...) using any of multiple styles (completion handlers, futures, coroutines, blocking...) and much more. I won't provide this foundation myself into Boost.Http as Asio already does it for me.
Such code is also less intrusive: only minimal chages were required to
change a tcp example into the http example. Users will appreciate that, especially in cases when ASIO is already used for HTTP.
I do appreciate the "less intrusive" guideline. But I dislike the attempt to make http::socket as close to tcp::socket (or any other socket) as possible. It'll just remove features and you're ignoring that some users want alternative HTTP backends. Your proposal is **less** useful because it can be used in **less** applications. You need to elaborate more if you want to convince that this approach will make **more** users happy.
I asked this elsewhere: is there a plan to select different HTTP parsers? How would that be achieved in terms of the existing socket concepts? Would a new Parser concept be introduced? Or would the parser be tied to a specific implementation of the existing socket concepts? i.e. basic_socket is always the current parser, while basic_spirit_socket is a spirit based approach ?
BTW, after that you could easily implement http::stream (that must be close
to tcp::stream by functionality). http::stream would be a killer feature that would be widely used by utility programs (as I remember current Boost's regression testing requires it, but uses tcp::stream with a bunch of workarounds).
An http::stream would be an adapter, not decrease the current API quality, so I'm less reluctant against this point. Anyway, the more I read your review, the more I think you're completely ignoring that network code might not even be involved in passing HTTP messages around. This would explain why you could prefer cpp-netlib, but it is just less useful.
Please, implement the HTTP2.0 protocol. There are some open source
solutions that are licensed under the Boost license and are very close to full implementation of HTTP2.0. Just reuse the code and chat with the authors for better understanding of the problems.
I'll try to provide a prototype tonight, but the intent of the prototype is just to prove that this core API is okay.
Also please do not use coroutines at least in first example. coroutines is
a new feature in ASIO, many users use old Boost versions that have no coroutines and some users are not familiar with them. So if you start with coroutines user will have to go to ASIO docs and dig into coroutines to understand the very first example. Example without coroutines will be more user friendly.
This is the kind of argument... by the time Boost.Http reaches Boost, Asio users will already be used to coroutines. I don't even depend on the latest Boost release if I'm not mistaken.
What about people that want to test the library before any (potential) boost inclusion? Or what about people that want a lower overhead (but more limited) ASIO::coroutine approach - is that a possible combination?
7. How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?
A few hours digging into docs, about 3 hours looking into the sources, about 3 hours formulising the issues and recommending solutions.
Please read the design choices chapter. You seem to be completely ignoring all use cases where a standalone server (e.g. FastCGI, ZeroMQ...) isn't used.
I read the design choices section, and I don't understand the relevance. Could you elaborate?
Lee
2015-08-13 22:19 GMT-03:00 Lee Clagett
This idea is horrible. An HTTP message is not a std::string. HTTP does have a string representation, the HTTP wire format from HTTP 1.1, which is different than HTTP 2.0, which is different than FastCGI, which would be different than a ZeroMQ-based approach and so on and so on.
Why would this idea be limited to strings? The generator/parser could easily take another type. And std::strings can store binary data network data, if thats what your suggesting (it wasn't exactly clear to me).
A string, a vector of char or any buffer is not an http message. Conceptually you're writing a stream of the HTTP/1.1 wire format, when one might not even exist (suppose you're using HTTP/1.0, HTTP/2.0, FastCGI or ZeroMQ). Any HTTP parser will suit your desire. This library is a layer above that (and incrementally reaching for more). I'm not proposing an HTTP parser, I'm proposing an HTTP server library.
This library will grow and also include a client. The HTTP parser is the least useful thing this library could provide. I'm aiming a library that is so useful that you'd use it no matter if you're going to write an application to be used on Raspberry Pi (or even more constrained) or on a cluster of nodes that distribute workload among thousands and thousands of nodes and use different intermediate protocols.
Definining a HTTP parser (specifically a SAX-style parser) is likely the only way to achieve zero allocations, which is exactly what the parser in use by this library is doing. Providing C++ templated SAX parser _would_ be novel - it would allow for inlined/direct callbacks as opposed to the indirect callbacks provided by the C parser in use. Probably a small reduction in CPU cycles though, since parsing time should dominate. Providing such a library would likely change the current design dramatically.
No. That's a way to avoid memory copies. That's not necessary to avoid zero allocations. You can have a custom backend tied to a single type of message that will do the HTTP parsing for you. It could be inefficient by maybe forcing the parsing every time you call headers(), but without an implementation I don't have much to say.
shouldn't be exposed to the user anyway. The user is interested in features (can I use chunking? can I use X?). It's more portable. You'd suggest a different approach if you had invested time to propose a library that could be used with different backends.
About your http::view proposal: It's not much different from http::Message concept, but:
- Your http::view read headers and read_state. This means you're mixing/coupling communication channels and HTTP messages. Bad for different backends. It's bad because you then need to answer questions such as "how do I pass a pool, custom alocator and this and that to my backend?". - Also, how does your proposal handle progressive download and chunking? v.body() doesn't seem very/any asynchronous at all. The flush method in response is useful for chunking, but the lack of information around
You even assume HTTP version is present. It may not even make sense and it the
capability of chunking by the communication channel leaves no option but to buffer implicitly, which can exhaust memory in video live stream. And this magic node-like API can very easily hamper interoperability among different implementations (I've already went into this road[1].
About the no-buffering approach, I just cannot give this guarantee. Any object fulfilling the ServerSocket concept will have its own guarantees and it is up to the user to check them. At the handling level, you can use custom types fulfilling the message concept that will avoid memory allocation. This possibility is not accidental. Just like you're worried, I'm worried too.
How could a user provide a message concept that satifies the SequenceContainer for the body avoid memory allocations? The documentation for reading the body doesn't state how the SequenceContainer is being used or modified. Can I set a size for the container, that the http::Socket concept never exceeds while reading (effectively making it a fixed buffer like ASIO)? I've read this several times (and maybe I'm being thick), but if I were writing an implementation of this concept, it would seem that its implementation defined how the SequenceContainer is being manipulated. Currently the body is being automatically resized on reads, with new data being copied in. I _think_ the only way to control how much is being copied in at a time (and thus the max the SequenceContainer is being resized), is through the buffer argument to basic_socket, but this is undocumented.
And to satifies the message requirements for the header/trailers requires an AssociateMap
. Currently the key/value types are required to meet the string concept (which I do not recall being defined by C++ or boost), but assuming it meant an stl compatible basic_string, the best way to avoid memory allocation is a small string optimization and combine with flat_map or an intrusive map. This would reduce, but not eliminate allocations. I'm slightly less concerned about this, I think its a good tradeoff, but I'm not sure how acceptable these loose constraints are for embedded situations that are continually mentioned. Any embedded developers following this?
Like you guessed, you pass a buffer to basic_socket. It won't read more bytes than the buffer size. Fair enough. I didn't document this behaviour. I'll gather all these points after the review period to implement these improvements. About the embedded device situation, it'll be improved when I expose the parser options, then you'll be able to set max headers size, max header name size, mas header value size and so on. With all upper limits figured out, you can provide a single chunk of memory for all data.
Also please do not use coroutines at least in first example. coroutines is
a new feature in ASIO, many users use old Boost versions that have no coroutines and some users are not familiar with them. So if you start with coroutines user will have to go to ASIO docs and dig into coroutines to understand the very first example. Example without coroutines will be more user friendly.
This is the kind of argument... by the time Boost.Http reaches Boost, Asio users will already be used to coroutines. I don't even depend on the latest Boost release if I'm not mistaken.
What about people that want to test the library before any (potential) boost inclusion? Or what about people that want a lower overhead (but more limited) ASIO::coroutine approach - is that a possible combination?
It's the tutorial and it's the most readable thing you can ever get. But the tutorial is surely far from a reference on teachability. Unfortunately I'm not so skilled to improve it. This review is helping to raise points and to see where there are more doubts. And even if I'm terrible teacher to write tutorials, I still can write lots of examples (and write a completion handler based one).
7. How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?
A few hours digging into docs, about 3 hours looking into the sources, about 3 hours formulising the issues and recommending solutions.
Please read the design choices chapter. You seem to be completely ignoring all use cases where a standalone server (e.g. FastCGI, ZeroMQ...) isn't used.
I read the design choices section, and I don't understand the relevance. Could you elaborate?
A simple use case: You're not directly exposing your application to the network. You're using proxies with auto load balancing. You're not using HTTP wire protocol to glue communication among the internal nodes. You're using ZeroMQ. There is no HTTP wire format usage in your application at all and Boost.Http still would be used, with the current API, as is. A different HTTP backend would be provided and you're done. It makes no sense to enforce the use of HTTP wire format in this use case at all. And if you're an advocate of HTTP wire format, keep in mind that the format changed in HTTP 2.0. There is no definitive set-in-stone serialized representation. -- Vinícius dos Santos Oliveira https://about.me/vinipsmaker
On Thu, Aug 13, 2015 at 11:16 PM, Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com> wrote:
2015-08-13 22:19 GMT-03:00 Lee Clagett
: [snip]
This library will grow and also include a client. The HTTP parser is the least useful thing this library could provide. I'm aiming a library that is so useful that you'd use it no matter if you're going to write an application to be used on Raspberry Pi (or even more constrained) or on a cluster of nodes that distribute workload among thousands and thousands of nodes and use different intermediate protocols.
Definining a HTTP parser (specifically a SAX-style parser) is likely the only way to achieve zero allocations, which is exactly what the parser in use by this library is doing. Providing C++ templated SAX parser _would_ be novel - it would allow for inlined/direct callbacks as opposed to the indirect callbacks provided by the C parser in use. Probably a small reduction in CPU cycles though, since parsing time should dominate. Providing such a library would likely change the current design dramatically.
No. That's a way to avoid memory copies. That's not necessary to avoid zero allocations.
You can have a custom backend tied to a single type of message that will do the HTTP parsing for you. It could be inefficient by maybe forcing the parsing every time you call headers(), but without an implementation I don't have much to say.
But this would only be able to handle a one HTTP message type? Or would drop some useful information? I think it would be difficult to implement a non-allocating HTTP parser unless it was SAX, or stopped at defined points (essentially notifying you like a SAX parser).
shouldn't be exposed to the user anyway. The user is interested in features (can I use chunking? can I use X?). It's more portable. You'd suggest a different approach if you had invested time to propose a library that could be used with different backends.
About your http::view proposal: It's not much different from http::Message concept, but:
- Your http::view read headers and read_state. This means you're mixing/coupling communication channels and HTTP messages. Bad for different backends. It's bad because you then need to answer questions such as "how do I pass a pool, custom alocator and this and that to my backend?". - Also, how does your proposal handle progressive download and chunking? v.body() doesn't seem very/any asynchronous at all. The flush method in response is useful for chunking, but the lack of information around
You even assume HTTP version is present. It may not even make sense and it the
capability of chunking by the communication channel leaves no option but to buffer implicitly, which can exhaust memory in video live stream. And this magic node-like API can very easily hamper interoperability among different implementations (I've already went into this road[1].
About the no-buffering approach, I just cannot give this guarantee. Any object fulfilling the ServerSocket concept will have its own guarantees and it is up to the user to check them. At the handling level, you can use custom types fulfilling the message concept that will avoid memory allocation. This possibility is not accidental. Just like you're worried, I'm worried too.
How could a user provide a message concept that satifies the SequenceContainer for the body avoid memory allocations? The documentation for reading the body doesn't state how the SequenceContainer is being used or modified. Can I set a size for the container, that the http::Socket concept never exceeds while reading (effectively making it a fixed buffer like ASIO)? I've read this several times (and maybe I'm being thick), but if I were writing an implementation of this concept, it would seem that its implementation defined how the SequenceContainer is being manipulated. Currently the body is being automatically resized on reads, with new data being copied in. I _think_ the only way to control how much is being copied in at a time (and thus the max the SequenceContainer is being resized), is through the buffer argument to basic_socket, but this is undocumented.
And to satifies the message requirements for the header/trailers requires an AssociateMap
. Currently the key/value types are required to meet the string concept (which I do not recall being defined by C++ or boost), but assuming it meant an stl compatible basic_string, the best way to avoid memory allocation is a small string optimization and combine with flat_map or an intrusive map. This would reduce, but not eliminate allocations. I'm slightly less concerned about this, I think its a good tradeoff, but I'm not sure how acceptable these loose constraints are for embedded situations that are continually mentioned. Any embedded developers following this? Like you guessed, you pass a buffer to basic_socket. It won't read more bytes than the buffer size.
But how can this be combined with higher order functions? For example a
`async_read_response(Socket, MaxDataSize, void(uint16_t, std::string,
vector
Fair enough. I didn't document this behaviour. I'll gather all these points after the review period to implement these improvements.
About the embedded device situation, it'll be improved when I expose the parser options, then you'll be able to set max headers size, max header name size, mas header value size and so on. With all upper limits figured out, you can provide a single chunk of memory for all data.
But what if an implementation wanted to discard some fields to really keep the memory low? I think that was the point of the OP. I think this is difficult to achieve with a notifying parser. It might be overkill for Boost.Http, people under this durress can seek out existing Http parsers. [snip]
reading? In-depth study?
A few hours digging into docs, about 3 hours looking into the
7. How much effort did you put into your evaluation? A glance? A quick sources,
about 3 hours formulising the issues and recommending solutions.
Please read the design choices chapter. You seem to be completely ignoring all use cases where a standalone server (e.g. FastCGI, ZeroMQ...) isn't used.
I read the design choices section, and I don't understand the relevance. Could you elaborate?
A simple use case: You're not directly exposing your application to the network. You're using proxies with auto load balancing. You're not using HTTP wire protocol to glue communication among the internal nodes. You're using ZeroMQ. There is no HTTP wire format usage in your application at all and Boost.Http still would be used, with the current API, as is. A different HTTP backend would be provided and you're done.
It makes no sense to enforce the use of HTTP wire format in this use case
at all. And if you're an advocate of HTTP wire format, keep in mind that the format changed in HTTP 2.0. There is no definitive set-in-stone serialized representation.
Yes, if HTTP were converted into a different (more efficient) wire format (as I've seen done in various ways - sandstorm/capnproto now does this too), a new implementation of http::ServerSocket could re-read that format and be compatible. It would be useful to state this more clearly in the documentation, unless I missed it (sorry). Lee
On Fri, Aug 14, 2015 at 1:49 AM, Lee Clagett
On Thu, Aug 13, 2015 at 11:16 PM, Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com> wrote:
2015-08-13 22:19 GMT-03:00 Lee Clagett
: Fair enough. I didn't document this behaviour. I'll gather all these points after the review period to implement these improvements.
About the embedded device situation, it'll be improved when I expose the parser options, then you'll be able to set max headers size, max header name size, mas header value size and so on. With all upper limits figured out, you can provide a single chunk of memory for all data.
But what if an implementation wanted to discard some fields to really keep the memory low? I think that was the point of the OP. I think this is difficult to achieve with a notifying parser. It might be overkill for Boost.Http, people under this durress can seek out existing Http parsers.
*without a notfying parser
2015-08-14 2:49 GMT-03:00 Lee Clagett
No. That's a way to avoid memory copies. That's not necessary to avoid zero allocations.
You can have a custom backend tied to a single type of message that will do the HTTP parsing for you. It could be inefficient by maybe forcing the parsing every time you call headers(), but without an implementation I don't have much to say.
But this would only be able to handle a one HTTP message type? Or would drop some useful information? I think it would be difficult to implement a non-allocating HTTP parser unless it was SAX, or stopped at defined points (essentially notifying you like a SAX parser).
As type of message, I was referring to the Message concept: https://boostgsoc14.github.io/boost.http/reference/message_concept.html And yes, this implementation would work with only one type. The idea is: the message object is just a buffer with an embedded parser and the socket will just transfer responsibility to the message. The user API stays the same. A buffer the same size would still be interesting in the socket to efficiently support HTTP pipelining (we cannot have data from different messages in the same message object, as it might be dropped at any time by the user). I'm not slightly worried about the problem you mention with the parser. I know it's possible. It won't show itself as a problem in the future.
Like you guessed, you pass a buffer to basic_socket. It won't read more
bytes than the buffer size.
But how can this be combined with higher order functions? For example a `async_read_response(Socket, MaxDataSize, void(uint16_t, std::string, vector
, error_code))`? However such a utility is defined, it will have to be tied to a specific implementation currently, because theres no way to control the max-read size via socket concept. Or would such a function omit a max read size (several other libraries don't have one either)? Or would it just overread a _bit_ into the container?
The problem isn't "how can this be combined with higher order functions?". The problem is "how can this feature be exposed portably among different HTTP backends?" and the answer is "it can't because it might not even make sense in all HTTP backends". Of course this comment is about the hacky solution (use a buffer of limited size in the HTTP backend). On the non-hacky front, some traits exposing extra API could be defined. The basic_socket could implement these traits without hampering the implementation of other backends that have different characteristics.
About the embedded device situation, it'll be improved when I expose the
parser options, then you'll be able to set max headers size, max header name size, mas header value size and so on. With all upper limits figured out, you can provide a single chunk of memory for all data.
But what if an implementation wanted to discard some fields to really keep the memory low? I think that was the point of the OP. I think this is difficult to achieve with a notifying parser. It might be overkill for Boost.Http, people under this durress can seek out existing Http parsers.
Filling HTTP headers is responsibility of the socket. The socket is the communication channel, after all. A blacklist of headers wouldn't work always, as the client can easily use different headers. A whitelist of allowed headers can work better. A solution that is more generic is a predicate. It can go into the parser options later. A trait could be defined to also expose the same API in different HTTP backends that might not need a parser.
A simple use case: You're not directly exposing your application to the
network. You're using proxies with auto load balancing. You're not using HTTP wire protocol to glue communication among the internal nodes. You're using ZeroMQ. There is no HTTP wire format usage in your application at all and Boost.Http still would be used, with the current API, as is. A different HTTP backend would be provided and you're done.
It makes no sense to enforce the use of HTTP wire format in this use case
at all. And if you're an advocate of HTTP wire format, keep in mind that the format changed in HTTP 2.0. There is no definitive set-in-stone serialized representation.
Yes, if HTTP were converted into a different (more efficient) wire format (as I've seen done in various ways - sandstorm/capnproto now does this too), a new implementation of http::ServerSocket could re-read that format and be compatible. It would be useful to state this more clearly in the documentation, unless I missed it (sorry).
You can already have any message representation you want: It's the message concept. And it was crafted **very** carefully: https://boostgsoc14.github.io/boost.http/reference/message_concept.html -- Vinícius dos Santos Oliveira https://about.me/vinipsmaker
2015-08-14 2:49 GMT-03:00 Lee Clagett
: No. That's a way to avoid memory copies. That's not necessary to avoid zero allocations.
You can have a custom backend tied to a single type of message that will do the HTTP parsing for you. It could be inefficient by maybe forcing the parsing every time you call headers(), but without an implementation I don't have much to say.
But this would only be able to handle a one HTTP message type? Or would drop some useful information? I think it would be difficult to implement a non-allocating HTTP parser unless it was SAX, or stopped at defined points (essentially notifying you like a SAX parser).
As type of message, I was referring to the Message concept: https://boostgsoc14.github.io/boost.http/reference/message_concept.html
And yes, this implementation would work with only one type.
The idea is: the message object is just a buffer with an embedded parser and the socket will just transfer responsibility to the message. The user API stays the same. A buffer the same size would still be interesting in the socket to efficiently support HTTP pipelining (we cannot have data from different messages in the same message object, as it might be dropped at any time by the user).
I'm not slightly worried about the problem you mention with the parser. I know it's possible. It won't show itself as a problem in the future.
Like you guessed, you pass a buffer to basic_socket. It won't read more
bytes than the buffer size.
But how can this be combined with higher order functions? For example a `async_read_response(Socket, MaxDataSize, void(uint16_t, std::string, vector
, error_code))`? However such a utility is defined, it will have to be tied to a specific implementation currently, because theres no way to control the max-read size via socket concept. Or would such a function omit a max read size (several other libraries don't have one either)? Or would it just overread a _bit_ into the container? The problem isn't "how can this be combined with higher order functions?". The problem is "how can this feature be exposed portably among different HTTP backends?" and the answer is "it can't because it might not even make sense in all HTTP backends". Of course this comment is about the hacky solution (use a buffer of limited size in the HTTP backend).
Both questions are identical. When a function makes a call to `async_read_some` with a generic http::Socket concept, it has no way of knowing or controlling how many bytes will be read into the container
On Fri, Aug 14, 2015 at 4:31 PM, Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com> wrote: provided by the message concept. It is currently implementation defined. On the non-hacky front, some traits exposing extra API could be defined.
The basic_socket could implement these traits without hampering the implementation of other backends that have different characteristics.
Ideally the argument to `async_read_some` would just be an ASIO buffer, which implicitly has a maximum read size. This only appears possible if the current C parser is abused a bit (moving states manually). However, I think its worth providing the best interface, and then do whatever necessary to make those details work. And I think accepting just an ASIO buffer would be the best possible interface for `async_read_some`. Adding a size_t maximum read argument should be possible at a minimum. I do not see how this could hamper any possible backends, its only role is to explicitly limit how many bytes are inserted to the back of the container in a single call. With this feature, a client could at least reserve bytes in the container, and prevent further allocations through a max_read argument.
parser options, then you'll be able to set max headers size, max header name size, mas header value size and so on. With all upper limits
About the embedded device situation, it'll be improved when I expose the figured
out, you can provide a single chunk of memory for all data.
But what if an implementation wanted to discard some fields to really keep the memory low? I think that was the point of the OP. I think this is difficult to achieve with a notifying parser. It might be overkill for Boost.Http, people under this durress can seek out existing Http parsers.
Filling HTTP headers is responsibility of the socket. The socket is the communication channel, after all. A blacklist of headers wouldn't work always, as the client can easily use different headers. A whitelist of allowed headers can work better. A solution that is more generic is a predicate. It can go into the parser options later.
A predicate design would either have to buffer the entire field which would make it an allocating design, or it would have to provide partial values which would make it similar to a SAX parser but with the confusion of being called a predicate. The only point is that a system that needs ultimate control over memory management would likely need a parser (push or pull) that notifies the client of pre-defined boundaries. I think the design of Boost.Http doesn't provide an interface suitable for zero allocations because either large memory is being pre-allocating, or certain _hard_ restrictions need to be placed on the header. Instead Boost.Http leans towards ease-of-use a bit. I think this is an acceptable tradeoff, because environments with extremely strict memory requirements can use other solutions. Boost.Http is unlikely to suit the needs of everyone. But better memory management in a few areas would be helpful (as already mentioned above). The predicate design that you mentioned, which would likely buffer the entire field, is interesting. Rejecting a field would allow for memory re-use by the implementation for the next field. Would be worth investigating how to provide that interface, and the performance benefits. Hopefully it would be a low effort way to help people who only want to store _some_ HTTP fields. There are an unbelievable amount of HTTP fields that generally get ignored anyway. A trait could be defined to also expose the same API in different HTTP
backends that might not need a parser.
A simple use case: You're not directly exposing your application to the
network. You're using proxies with auto load balancing. You're not using HTTP wire protocol to glue communication among the internal nodes. You're using ZeroMQ. There is no HTTP wire format usage in your application at all and Boost.Http still would be used, with the current API, as is. A different HTTP backend would be provided and you're done.
at all. And if you're an advocate of HTTP wire format, keep in mind
It makes no sense to enforce the use of HTTP wire format in this use case that
the format changed in HTTP 2.0. There is no definitive set-in-stone serialized representation.
Yes, if HTTP were converted into a different (more efficient) wire format (as I've seen done in various ways - sandstorm/capnproto now does this too), a new implementation of http::ServerSocket could re-read that format and be compatible. It would be useful to state this more clearly in the documentation, unless I missed it (sorry).
You can already have any message representation you want: It's the message concept. And it was crafted **very** carefully: https://boostgsoc14.github.io/boost.http/reference/message_concept.html
I don't think we are talking about the same thing; the message concept doesn't define the wire format. The implementation of the http::Socket concept certainly does, which is what I thought the discussion was about here. Either way my suggestion was that it might be worth noting _somewhere_ in the documentation that different wire formats for HTTP can be supported with a different http::Socket concept implementation. Although until a different implementation is actually written (fastcgi seems like a good candidate), its difficult to say whether the currently defined abstractions are suitable for other (or even most/all) wire formats. So my apologies for the bad suggestion. Lee
2015-08-15 14:32 GMT-03:00 Lee Clagett
Adding a size_t maximum read argument should be possible at a minimum. I do not see how this could hamper any possible backends, its only role is to explicitly limit how many bytes are inserted to the back of the container in a single call. With this feature, a client could at least reserve bytes in the container, and prevent further allocations through a max_read argument.
This is not a TCP socket, it's an HTTP socket. An HTTP message is not a stream of bytes. There is an already max read size. It's the buffer size you pass to basic_socket.
Filling HTTP headers is responsibility of the socket. The socket is the
communication channel, after all. A blacklist of headers wouldn't work always, as the client can easily use different headers. A whitelist of allowed headers can work better. A solution that is more generic is a predicate. It can go into the parser options later.
A predicate design would either have to buffer the entire field which would make it an allocating design, or it would have to provide partial values which would make it similar to a SAX parser but with the confusion of being called a predicate. The only point is that a system that needs ultimate control over memory management would likely need a parser (push or pull) that notifies the client of pre-defined boundaries.
You should also be able to choose a maximum header name size, so it's possible to use a stack-allocated buffer. I think the design of Boost.Http doesn't provide an interface suitable for
zero allocations because either large memory is being pre-allocating, or certain _hard_ restrictions need to be placed on the header. Instead Boost.Http leans towards ease-of-use a bit. I think this is an acceptable tradeoff, because environments with extremely strict memory requirements can use other solutions. Boost.Http is unlikely to suit the needs of everyone.
Yes, I think some users are showing a desire to have an HTTP parser, as they are completely ignoring alternative HTTP backends. For these users, only an HTTP parser may suit its needs.
You can already have any message representation you want: It's the message
concept. And it was crafted **very** carefully: https://boostgsoc14.github.io/boost.http/reference/message_concept.html
I don't think we are talking about the same thing; the message concept doesn't define the wire format. The implementation of the http::Socket concept certainly does, which is what I thought the discussion was about here. Either way my suggestion was that it might be worth noting _somewhere_ in the documentation that different wire formats for HTTP can be supported with a different http::Socket concept implementation. Although until a different implementation is actually written (fastcgi seems like a good candidate), its difficult to say whether the currently defined abstractions are suitable for other (or even most/all) wire formats. So my apologies for the bad suggestion.
Exactly, it doesn't define a wire format, it defines a standardized way to access the message components. Internally, you can have any representation you want. You could have a Message type that is both a model of the Message concept and an HTTP parser and the Socket could detect if this Message type is being used, so i'll just hand over a few values here and there. -- Vinícius dos Santos Oliveira https://about.me/vinipsmaker
2015-08-15 14:32 GMT-03:00 Lee Clagett
: Adding a size_t maximum read argument should be possible at a minimum. I do not see how this could hamper any possible backends, its only role is to explicitly limit how many bytes are inserted to the back of the container in a single call. With this feature, a client could at least reserve bytes in the container, and prevent further allocations through a max_read argument.
This is not a TCP socket, it's an HTTP socket. An HTTP message is not a stream of bytes.
HTTP has two streaming modes: chunked, and "connection:close" with no chunked encoding or defined length. Its not always viewed as that, but certain applications [1] take advantage of these "features" available in HTTP. The problem for many libraries (as you've partially mentioned before), is
On Sat, Aug 15, 2015 at 3:06 PM, Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com> wrote: that if data is being sent in that fashion, they will consume memory without bounds if designed to read until the end of the payload. There is an already max read size. It's the buffer size you pass to
basic_socket.
But this is not defined by the http::Socket concept. Its not possible to write a function that takes any http::Socket concept and limits the number of bytes being pushed into the container. A conforming http::Socket implementation is currently allowed to keep resizing the container as necessary to add data (even until payload end), and I thought the prevention of that scenario was being touted as a benefit of Boost.Http. Adding a size_t parameter or a fixed buffer to `async_read_some` is a strong signal of intent to implementors, and a weaker one would be a statement in the documentation that a conforming implementation of the concept can only read/insert an unspecified fixed number of bytes before invoking the callback.
Filling HTTP headers is responsibility of the socket. The socket is the
communication channel, after all. A blacklist of headers wouldn't work always, as the client can easily use different headers. A whitelist of allowed headers can work better. A solution that is more generic is a predicate. It can go into the parser options later.
A predicate design would either have to buffer the entire field which would make it an allocating design, or it would have to provide partial values which would make it similar to a SAX parser but with the confusion of being called a predicate. The only point is that a system that needs ultimate control over memory management would likely need a parser (push or pull) that notifies the client of pre-defined boundaries.
You should also be able to choose a maximum header name size, so it's possible to use a stack-allocated buffer.
The memory requirements are affected by the max header size. With a push/pull parser it is possible to rip out information in a fixed amount of memory. The HTTP parser this library is using is a good example - it never allocates memory, does not require the client allocate any memory, and the #define for the max header size does _not_ change the size requirements of its data structures. It keeps necessary state and information in a fixed amount of space, yet is still able to know whether transfer-encoding: chunked was sent, etc. The initial source of my parser thoughts were how to combine ideas from boost::spirit into a HTTP system. A client could do a POST/PUT/DELETE, and then issue `msg_socket.async_read(http::parse_response_code(), void(uint16_t, error_code))` which would construct a HTTP parser that extracts the response code from the server, tracks a minimal set of headers (content-length, transfer-encoding, connection), yet still operates in a fixed memory budget even if max header / max payload were size_t::max. I still don't see how this is possible without a notification parser exposed somewhere in the design. Again, I'm not downvoting Boost.Http because it lacks this capability. I'm not sure of the demand for such a complicated library just to manipulate Http sockets. [1] http://ajaxpatterns.org/HTTP_Streaming Lee
2015-08-15 23:14 GMT-03:00 Lee Clagett
But this is not defined by the http::Socket concept. Its not possible to write a function that takes any http::Socket concept and limits the number of bytes being pushed into the container. A conforming http::Socket implementation is currently allowed to keep resizing the container as necessary to add data (even until payload end), and I thought the prevention of that scenario was being touted as a benefit of Boost.Http. Adding a size_t parameter or a fixed buffer to `async_read_some` is a strong signal of intent to implementors, and a weaker one would be a statement in the documentation that a conforming implementation of the concept can only read/insert an unspecified fixed number of bytes before invoking the callback.
You can pass a message which will just drop data (a stub push_back function) and it will work with any Socket. Limit the number of bytes being pushed is not a property that any Socket can guarantee. You cannot write that function. The Socket might just not be able to save the excessive read to later delivery and must fill it now. Like I said, a trait could help you with a standardized API for **capable** Sockets. The memory requirements are affected by the max header size. With a
push/pull parser it is possible to rip out information in a fixed amount of memory. The HTTP parser this library is using is a good example - it never allocates memory, does not require the client allocate any memory, and the #define for the max header size does _not_ change the size requirements of its data structures. It keeps necessary state and information in a fixed amount of space, yet is still able to know whether transfer-encoding: chunked was sent, etc.
And this is a very specific parser, it won't work with alternative HTTP backends. The initial source of my parser thoughts were how to combine ideas from
boost::spirit into a HTTP system. A client could do a POST/PUT/DELETE, and then issue `msg_socket.async_read(http::parse_response_code(), void(uint16_t, error_code))` which would construct a HTTP parser that extracts the response code from the server, tracks a minimal set of headers (content-length, transfer-encoding, connection), yet still operates in a fixed memory budget even if max header / max payload were size_t::max. I still don't see how this is possible without a notification parser exposed somewhere in the design. Again, I'm not downvoting Boost.Http because it lacks this capability. I'm not sure of the demand for such a complicated library just to manipulate Http sockets.
I can provide the parser separately in the future, and this parser will be useful only to the HTTP wire format, not useful if you want to target alternative HTTP backends. It's a trade-off the user will have to make. -- Vinícius dos Santos Oliveira https://about.me/vinipsmaker
On Sat, Aug 15, 2015 at 10:26 PM, Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com> wrote:
2015-08-15 23:14 GMT-03:00 Lee Clagett
: But this is not defined by the http::Socket concept. Its not possible to write a function that takes any http::Socket concept and limits the number of bytes being pushed into the container. A conforming http::Socket implementation is currently allowed to keep resizing the container as necessary to add data (even until payload end), and I thought the prevention of that scenario was being touted as a benefit of Boost.Http. Adding a size_t parameter or a fixed buffer to `async_read_some` is a strong signal of intent to implementors, and a weaker one would be a statement in the documentation that a conforming implementation of the concept can only read/insert an unspecified fixed number of bytes before invoking the callback.
You can pass a message which will just drop data (a stub push_back function) and it will work with any Socket.
Limit the number of bytes being pushed is not a property that any Socket can guarantee. You cannot write that function. The Socket might just not be able to save the excessive read to later delivery and must fill it now. Like I said, a trait could help you with a standardized API for **capable** Sockets.
A message stub that drops data probably doesn't meet the requirements of a SequenceContainer, but I would have to check. Why would there be an excessive read? The implementation knows before the read call that only x bytes can be read, why would it read past that amount? If this interface works for ASIO, it would seem suitable for reading a sequence of bytes from a HTTP payload. Lee
2015-08-16 0:25 GMT-03:00 Lee Clagett
Why would there be an excessive read? The implementation knows before the read call that only x bytes can be read, why would it read past that amount? If this interface works for ASIO, it would seem suitable for reading a sequence of bytes from a HTTP payload.
Because it might not even use a buffer. You continue to only think about the HTTP wire format and the standalone/embedded server use case. It could be a hypothetical protocol that uses shared memory that get dropped and must be consumed. -- Vinícius dos Santos Oliveira https://about.me/vinipsmaker
On Sun, Aug 16, 2015 at 10:07 AM, Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com> wrote:
2015-08-16 0:25 GMT-03:00 Lee Clagett
: Why would there be an excessive read? The implementation knows before the read call that only x bytes can be read, why would it read past that amount? If this interface works for ASIO, it would seem suitable for reading a sequence of bytes from a HTTP payload.
Because it might not even use a buffer. You continue to only think about the HTTP wire format and the standalone/embedded server use case. It could be a hypothetical protocol that uses shared memory that get dropped and must be consumed.
A UDP socket or unix domain datagram socket would be more obvious examples, but I would be interested in seeing how this would occur in shared memory (just not aware of such an example). Do you mean like a (possibly in-order) queue of messages? This actually sounds like a ZeroMQ scenario, most likely using multi-part messages, because its a pseudo in-order datagram system. Anyway, things to note - - The overwhelming common case will be reading from a stream, most likely TCP, and I think the focus should be providing the best interface for this scenario (apparently you feel differently). - The HTTP payload is either a stream (connection:close), a stream of datagrams (chunked), or a datagram (content-length). - Datagrams can also be treated like a stream if the datagram is kept across calls (as you mentioned). - The function is named `async_read_some` after the ASIO buffered_read_stream/buffered_stream concept, so it suggests data is being read from an underlying stream. - The behavior for `async_read_some` in the Socket concept can behave like a typical read from a stream interface, OR read an entire message of unspecified size depending on the implementation. - When reading datagrams/messages, a fixed buffer allocated by the caller is a common read interface, in which case the extra data is dropped. - Even users that expect these types of messages cannot reliably limit the Container expansion through the proposed implementation, as they can with UDP, domain sockets, ZeroMQ, etc. Lee
2015-08-16 14:16 GMT-03:00 Lee Clagett
- The overwhelming common case will be reading from a stream, most likely TCP, and I think the focus should be providing the best interface for this scenario (apparently you feel differently). [...] - When reading datagrams/messages, a fixed buffer allocated by the caller is a common read interface, in which case the extra data is dropped.
How does the trait implementation sounds to you? Sockets that **can** provide extra guarantees can implement the max_read trait or alike? -- Vinícius dos Santos Oliveira https://about.me/vinipsmaker
2015-08-13 18:38 GMT+03:00 Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com>:
This idea is horrible. An HTTP message is not a std::string. HTTP does have a string representation, the HTTP wire format from HTTP 1.1, which is different than HTTP 2.0, which is different than FastCGI, which would be different than a ZeroMQ-based approach and so on and so on.
You've totally misunderstood the example. Let's take a look at it:
1) For simplicity let's assume that we are working with HTTP1.1 and HTTP1.0
only. In that case:
namespace http { typedef boost::asio::tcp::tcp::socket socket; typedef
boost::asio::tcp::tcp::acceptor acceptor; }
2) Here's the part that takes care of communications only:
#include <ctime>
#include <iostream>
#include <string>
#include
2015-08-14 17:42 GMT-03:00 Antony Polukhin
You've totally misunderstood the example.
Yes, I'm sorry. The debate with Lee helped me to realize my mistake: http://article.gmane.org/gmane.comp.lib.boost.devel/262306 Let's take a look at it:
1) For simplicity let's assume that we are working with HTTP1.1 and HTTP1.0 only.
As long as the design doesn't hamper support for alternative HTTP backends. In that case:
namespace http { typedef boost::asio::tcp::tcp::socket socket; typedef boost::asio::tcp::tcp::acceptor acceptor; }
2) Here's the part that takes care of communications only:
[...]
Assuming that http::socket is a tcp::socket, that example is EXACTLY the same code that is in ASIO example at
http://www.boost.org/doc/libs/1_58_0/doc/html/boost_asio/tutorial/tutdaytime...
You still have to defend why discard all HTTP characteristics from a HTTP library is an advantage for an HTTP library. message_ is just a BUFFER for data. You can use std::vector<char>,
std::vector<unsigned char> or std::array<> here. It's not a string, it's the buffer!
http::completions::full(message_) is a functor, that returns 'Stop reading data' when our buffer (message_) contains the whole HTTP request.
In my design is "stop reading data when our buffer is full or contains the whole HTTP request". The "growable" thing is the HTTP message, which can be customized and is decoupled from the HTTP server. So, what does it mean? It means that when http_connection::handle_read
function is called, message_ contains whole data received from socket.
How to handle progressive download? 3) http::view NEWER downloads data. http::view is a parser, that represents
an input data as an HTTP message and provides a user friendly collection of methods to READ data.
It means that it could be used just like this:
const char data[] = "GET /wiki/HTTP HTTP/1.0\r\n" "Host: ru.wikipedia.org\r\n\r\n" ;
http::view v(data); assert(v.version() == "HTTP1.0");
You've been mislead by http::view::read_state() function. It does not investigate sockets. It determinates the state from the message content:
const char data1[] = "GET /wiki/HTTP HTTP/1.0\r\n" "Content-Length: 0\r\n" "Host: ru.wikipedia.org\r\n\r\n" ;
http::view v1(data1); assert(v1.read_state() == http::read_state::empty);
const char data2[] = "GET /wiki/HTTP HTTP/1.0\r\n" "Content-Length: 12312310\r\n" "Host: ru.wikipedia.org\r\n\r\n" "Hello word! This is a part of the message, because it is not totally recei" ;
http::view v2(data2); assert(v2.read_state() != http::read_state::empty);
Your design is very much like cpp-netlib, where the whole request is read at once. Again: how to handle progressive download? But, forgetting about specifics and focusing on the opposition of socket+message vs socket+buffer+view/parser, I might comment more later today. 4) Why this approach is better?
* It explicitly allows user to manage networking and memory.
That's difficult to achieve with support for multiple HTTP backends. Each HTTP backend might have its own guarantees. If I only need to provide guarantees related to a single backend, I do this with current design ideas (need updates in the implementation). * It separates work with network and work with HTTP message.
Can you elaborate further (and also elaborate why is better and how current Boost.Http approach is worse)? * It reuses ASIO interface
By dropping HTTP information/power. I like the http::stream adapter idea better. Adapt the HTTP socket to mimic Asio design. You don't lose features. * It does not implicitly allocates memory
The parser can end up allocating memory. You have to consider support for different HTTP backends. You can give guarantees about one backend (or anyone you provide), but not for all of them. * It can be used separately from ASIO. http::view and http::generator do
not use ASIO at all.
The view and generator will be different for each HTTP backend. The only view that I believe will be useful without Asio will be the HTTP wire format parser. I don't expose one yet. I didn't proposed a design for an HTTP parser. * Your assumption that there is always a requirement to read headers and
body separately is very wrong. HTTP headers are not so big and usually the whole HTTP message could be accepted by a single read. So when you force user to read headers and body separately you're forcing user to have more system calls/locks/context switches. However read with http::completions::full(message_) in most cases will result in a single read inside ASIO and a single call to http::completions::full::operator(). This will result in better performance.
After issuing a read request, more than one component can be read (hence the need for read_state). It's the same callback who will handle "excess" of components read. Also, reading headers before body is important thanks to HTTP 100-continue. To design an HTTP library, you just don't use your C++ knowledge, but you have to research a lot about HTTP itself. * cpp-netlib is very simple to use. If you always require io_service and
coroutines then your library is hard to use.
I started with a strong core proposal. High-level abstractions will follow. You have much more flexibility with Boost.Http than cpp-netlib (and many more libraries out there). For a high-level API, I rather prefer something like this: https://github.com/d5/node.native It supports more HTTP features than cpp-netlib and it is much less coupled/limiting. You could have compared how much verbose Boost.Http is compared to node.native, which at least has comparable features. * headers/body separate reads is not what really required
So you'll exhaust the application memory during live video streams. Not gonna change that. But you gave good ideas and a good motivation, to make the interface simpler. Some algorithms to executing the whole read for you could be provided. * no advantages over cpp-netlib
A few: - With the same API, you gain support for alternative HTTP backends. - Modern HTTP API with support for HTTP 100-continue, HTTP chunking, HTTP pipelining and HTTP upgrade. - It uses the Asio extensible asynchronous model that you can turn into blocking/synchronous easily (cpp-netlib has **two** server implementations). - It uses an active style, so you have more control. You can, for instance, defer acceptance of new connections during high load scenarios. - It doesn't force a thread pool on you. All network traffic (from Boost.Http and any other traffic too) can be handled on the same io_service. - It has strong support for asynchronous operations. You can progressively download the body, so memory won't be exhausted while an HTTP client submits a live video stream to your server. And you can also respond to messages with a live video streaming, checking with standardized API if the underlying channel supports streaming. How to fix things?
* Provide more functionality than cpp-netlib: * Allow users to manipulate memory and networking.
Okay. This could be improved. * Untie the library from networking and allow parts of it to be used on
raw data (http::view/http::generate).
The current library doesn't try to compete with HTTP parsers. An HTTP parser is not exposed. * ASIO interfaces re-usage and simple migration for users that already
use ASIO for HTTP. (tcp::socket -> http::socket)
No, tcp::socket is stream-oriented. http::socket is HTTP oriented. HTTP is not stream (like TCP) nor datagram (like UDP) oriented. HTTP is an specialization (request-reply) of the message approach. * HTTP2.0 ?
Okay. * Simple interface for beginners. If your first example consumes two
screens of text, while cpp-netlib's example consumes 0.5 screen - you'll loose
Okay. -- Vinícius dos Santos Oliveira https://about.me/vinipsmaker
2015-08-15 6:54 GMT-03:00 Vinícius dos Santos Oliveira < vini.ipsmaker@gmail.com>:
It uses the Asio extensible asynchronous model that you can turn into blocking/synchronous easily (cpp-netlib has **two** server implementations).
Actually, the Asio extensible model is **very** interesting. You could even handle the initial interaction of the message exchange and later hand the socket over to another abstraction which will continue to handling the socket by feeding the body of thousands of clients using the callback model. Not only you have more options with the same API, you can also change the style at will. -- Vinícius dos Santos Oliveira https://about.me/vinipsmaker
participants (3)
-
Antony Polukhin
-
Lee Clagett
-
Vinícius dos Santos Oliveira