2015-08-14 2:49 GMT-03:00 Lee Clagett
: No. That's a way to avoid memory copies. That's not necessary to avoid zero allocations.
You can have a custom backend tied to a single type of message that will do the HTTP parsing for you. It could be inefficient by maybe forcing the parsing every time you call headers(), but without an implementation I don't have much to say.
But this would only be able to handle a one HTTP message type? Or would drop some useful information? I think it would be difficult to implement a non-allocating HTTP parser unless it was SAX, or stopped at defined points (essentially notifying you like a SAX parser).
As type of message, I was referring to the Message concept: https://boostgsoc14.github.io/boost.http/reference/message_concept.html
And yes, this implementation would work with only one type.
The idea is: the message object is just a buffer with an embedded parser and the socket will just transfer responsibility to the message. The user API stays the same. A buffer the same size would still be interesting in the socket to efficiently support HTTP pipelining (we cannot have data from different messages in the same message object, as it might be dropped at any time by the user).
I'm not slightly worried about the problem you mention with the parser. I know it's possible. It won't show itself as a problem in the future.
Like you guessed, you pass a buffer to basic_socket. It won't read more
bytes than the buffer size.
But how can this be combined with higher order functions? For example a `async_read_response(Socket, MaxDataSize, void(uint16_t, std::string, vector
, error_code))`? However such a utility is defined, it will have to be tied to a specific implementation currently, because theres no way to control the max-read size via socket concept. Or would such a function omit a max read size (several other libraries don't have one either)? Or would it just overread a _bit_ into the container? The problem isn't "how can this be combined with higher order functions?". The problem is "how can this feature be exposed portably among different HTTP backends?" and the answer is "it can't because it might not even make sense in all HTTP backends". Of course this comment is about the hacky solution (use a buffer of limited size in the HTTP backend).
Both questions are identical. When a function makes a call to `async_read_some` with a generic http::Socket concept, it has no way of knowing or controlling how many bytes will be read into the container
On Fri, Aug 14, 2015 at 4:31 PM, VinÃcius dos Santos Oliveira < vini.ipsmaker@gmail.com> wrote: provided by the message concept. It is currently implementation defined. On the non-hacky front, some traits exposing extra API could be defined.
The basic_socket could implement these traits without hampering the implementation of other backends that have different characteristics.
Ideally the argument to `async_read_some` would just be an ASIO buffer, which implicitly has a maximum read size. This only appears possible if the current C parser is abused a bit (moving states manually). However, I think its worth providing the best interface, and then do whatever necessary to make those details work. And I think accepting just an ASIO buffer would be the best possible interface for `async_read_some`. Adding a size_t maximum read argument should be possible at a minimum. I do not see how this could hamper any possible backends, its only role is to explicitly limit how many bytes are inserted to the back of the container in a single call. With this feature, a client could at least reserve bytes in the container, and prevent further allocations through a max_read argument.
parser options, then you'll be able to set max headers size, max header name size, mas header value size and so on. With all upper limits
About the embedded device situation, it'll be improved when I expose the figured
out, you can provide a single chunk of memory for all data.
But what if an implementation wanted to discard some fields to really keep the memory low? I think that was the point of the OP. I think this is difficult to achieve with a notifying parser. It might be overkill for Boost.Http, people under this durress can seek out existing Http parsers.
Filling HTTP headers is responsibility of the socket. The socket is the communication channel, after all. A blacklist of headers wouldn't work always, as the client can easily use different headers. A whitelist of allowed headers can work better. A solution that is more generic is a predicate. It can go into the parser options later.
A predicate design would either have to buffer the entire field which would make it an allocating design, or it would have to provide partial values which would make it similar to a SAX parser but with the confusion of being called a predicate. The only point is that a system that needs ultimate control over memory management would likely need a parser (push or pull) that notifies the client of pre-defined boundaries. I think the design of Boost.Http doesn't provide an interface suitable for zero allocations because either large memory is being pre-allocating, or certain _hard_ restrictions need to be placed on the header. Instead Boost.Http leans towards ease-of-use a bit. I think this is an acceptable tradeoff, because environments with extremely strict memory requirements can use other solutions. Boost.Http is unlikely to suit the needs of everyone. But better memory management in a few areas would be helpful (as already mentioned above). The predicate design that you mentioned, which would likely buffer the entire field, is interesting. Rejecting a field would allow for memory re-use by the implementation for the next field. Would be worth investigating how to provide that interface, and the performance benefits. Hopefully it would be a low effort way to help people who only want to store _some_ HTTP fields. There are an unbelievable amount of HTTP fields that generally get ignored anyway. A trait could be defined to also expose the same API in different HTTP
backends that might not need a parser.
A simple use case: You're not directly exposing your application to the
network. You're using proxies with auto load balancing. You're not using HTTP wire protocol to glue communication among the internal nodes. You're using ZeroMQ. There is no HTTP wire format usage in your application at all and Boost.Http still would be used, with the current API, as is. A different HTTP backend would be provided and you're done.
at all. And if you're an advocate of HTTP wire format, keep in mind
It makes no sense to enforce the use of HTTP wire format in this use case that
the format changed in HTTP 2.0. There is no definitive set-in-stone serialized representation.
Yes, if HTTP were converted into a different (more efficient) wire format (as I've seen done in various ways - sandstorm/capnproto now does this too), a new implementation of http::ServerSocket could re-read that format and be compatible. It would be useful to state this more clearly in the documentation, unless I missed it (sorry).
You can already have any message representation you want: It's the message concept. And it was crafted **very** carefully: https://boostgsoc14.github.io/boost.http/reference/message_concept.html
I don't think we are talking about the same thing; the message concept doesn't define the wire format. The implementation of the http::Socket concept certainly does, which is what I thought the discussion was about here. Either way my suggestion was that it might be worth noting _somewhere_ in the documentation that different wire formats for HTTP can be supported with a different http::Socket concept implementation. Although until a different implementation is actually written (fastcgi seems like a good candidate), its difficult to say whether the currently defined abstractions are suitable for other (or even most/all) wire formats. So my apologies for the bad suggestion. Lee