Re: [boost] [http] Formal Review

15 Aug 2015

      ...
2015-08-14 2:49 GMT-03:00 Lee Clagett <forum@leeclagett.com>:
...
...
No. That's a way to avoid memory copies. That's not necessary to avoid
zero
allocations.
You can have a custom backend tied to a single type of message that
will
do
the HTTP parsing for you. It could be inefficient by maybe forcing the
parsing every time you call headers(), but without an implementation I
don't have much to say.
But this would only be able to handle a one HTTP message type? Or would
drop some useful information? I think it would be difficult to implement
a
non-allocating HTTP parser unless it was SAX, or stopped at defined
points
(essentially notifying you like a SAX parser).
As type of message, I was referring to the Message concept:
https://boostgsoc14.github.io/boost.http/reference/message_concept.html
And yes, this implementation would work with only one type.
The idea is: the message object is just a buffer with an embedded parser
and the socket will just transfer responsibility to the message. The user
API stays the same. A buffer the same size would still be interesting in
the socket to efficiently support HTTP pipelining (we cannot have data from
different messages in the same message object, as it might be dropped at
any time by the user).
I'm not slightly worried about the problem you mention with the parser. I
know it's possible. It won't show itself as a problem in the future.
...
Like you guessed, you pass a buffer to basic_socket. It won't read more
...
bytes than the buffer size.
But how can this be combined with higher order functions? For example a
`async_read_response(Socket, MaxDataSize, void(uint16_t, std::string,
vector<uint8_t>, error_code))`? However such a utility is defined, it
will
have to be tied to a specific implementation currently, because theres no
way to control the max-read size via socket concept. Or would such a
function omit a max read size (several other libraries don't have one
either)? Or would it just overread a _bit_ into the container?
The problem isn't "how can this be combined with higher order functions?".
The problem is "how can this feature be exposed portably among different
HTTP backends?" and the answer is "it can't because it might not even make
sense in all HTTP backends". Of course this comment is about the hacky
solution (use a buffer of limited size in the HTTP backend).
Both questions are identical. When a function makes a call to
`async_read_some` with a generic http::Socket concept, it has no way of
knowing or controlling how many bytes will be read into the container
On Fri, Aug 14, 2015 at 4:31 PM, Vinícius dos Santos Oliveira <
vini.ipsmaker@gmail.com> wrote:

provided by the message concept. It is currently implementation defined.

On the non-hacky front, some traits exposing extra API could be defined.
...
The basic_socket could implement these traits without hampering the
implementation of other backends that have different characteristics.
Ideally the argument to `async_read_some` would just be an ASIO buffer,
which implicitly has a maximum read size.  This only appears possible if
the current C parser is abused a bit (moving states manually). However, I
think its worth providing the best interface, and then do whatever
necessary to make those details work. And I think accepting just an ASIO
buffer would be the best possible interface for `async_read_some`.

Adding a size_t maximum read argument should be possible at a minimum. I do
not see how this could hamper any possible backends, its only role is to
explicitly limit how many bytes are inserted to the back of the container
in a single call. With this feature, a client could at least reserve bytes
in the container, and prevent further allocations through a max_read
argument.
...
...
...
parser options, then you'll be able to set max headers size, max header
name size, mas header value size and so on. With all upper limits
About the embedded device situation, it'll be improved when I expose the
figured
...
out, you can provide a single chunk of memory for all data.
But what if an implementation wanted to discard some fields to really
keep
the memory low? I think that was the point of the OP. I think this is
difficult to achieve with a notifying parser. It might be overkill for
Boost.Http, people under this durress can seek out existing Http parsers.
Filling HTTP headers is responsibility of the socket. The socket is the
communication channel, after all. A blacklist of headers wouldn't work
always, as the client can easily use different headers. A whitelist of
allowed headers can work better. A solution that is more generic is a
predicate. It can go into the parser options later.
A predicate design would either have to buffer the entire field which would
make it an allocating design, or it would have to provide partial values
which would make it similar to a SAX parser but with the confusion of being
called a predicate. The only point is that a system that needs ultimate
control over memory management would likely need a parser (push or pull)
that notifies the client of pre-defined boundaries.

I think the design of Boost.Http doesn't provide an interface suitable for
zero allocations because either large memory is being pre-allocating, or
certain _hard_ restrictions need to be placed on the header. Instead
Boost.Http leans towards ease-of-use a bit. I think this is an acceptable
tradeoff, because environments with extremely strict memory requirements
can use other solutions. Boost.Http is unlikely to suit the needs of
everyone.

But better memory management in a few areas would be helpful (as already
mentioned above). The predicate design that you mentioned, which would
likely buffer the entire field, is interesting. Rejecting a field would
allow for memory re-use by the implementation for the next field. Would be
worth investigating how to provide that interface, and the performance
benefits. Hopefully it would be a low effort way to help people who only
want to store _some_ HTTP fields. There are an unbelievable amount of HTTP
fields that generally get ignored anyway.

A trait could be defined to also expose the same API in different HTTP
...
backends that might not need a parser.
...
A simple use case: You're not directly exposing your application to the
...
network. You're using proxies with auto load balancing. You're not
using
HTTP wire protocol to glue communication among the internal nodes.
You're
using ZeroMQ. There is no HTTP wire format usage in your application at
all
and Boost.Http still would be used, with the current API, as is. A
different HTTP backend would be provided and you're done.
...
at all. And if you're an advocate of HTTP wire format, keep in mind
It makes no sense to enforce the use of HTTP wire format in this use case
that
...
the format changed in HTTP 2.0. There is no definitive set-in-stone
serialized representation.
Yes, if HTTP were converted into a different (more efficient) wire format
(as I've seen done in various ways - sandstorm/capnproto now does this
too), a new implementation of http::ServerSocket could re-read that
format
and be compatible. It would be useful to state this more clearly in the
documentation, unless I missed it (sorry).
You can already have any message representation you want: It's the message
concept. And it was crafted **very** carefully:
https://boostgsoc14.github.io/boost.http/reference/message_concept.html
I don't think we are talking about the same thing; the message concept
doesn't define the wire format. The implementation of the http::Socket
concept certainly does, which is what I thought the discussion was about
here. Either way my suggestion was that it might be worth noting
_somewhere_ in the documentation that different wire formats for HTTP can
be supported with a different http::Socket concept implementation. Although
until a different implementation is actually written (fastcgi seems like a
good candidate), its difficult to say whether the currently defined
abstractions are suitable for other (or even most/all) wire formats. So my
apologies for the bad suggestion.

Lee