Re: [boost] [asio-users] [http] Formal review of Boost.Http

25 Aug 2015

      On Tue, Aug 25, 2015 at 1:43 AM, Vinícius dos Santos Oliveira <
vini.ipsmaker@gmail.com> wrote:
...
Thanks for your review Lee, your comments are surely helpful for me, as I'm
going to do several improvements based on them.
2015-08-17 2:01 GMT-03:00 Lee Clagett <forum@leeclagett.com>:
...
So why is `async_write_response_continue` mandated in the
http::ServerSocket concept when a free function can be implemented, that
uses the other concept functions? If the reasoning is performance (which
appears to be the reason) - why isn't a more performant method (probably
some form of pre-generation) of sending messages added? This would
prevent
the socket concept from continually absorbing additional special case
messages as higher order functions are developed, and allow for clients
to
pre-generate error responses.
`async_write_response_continue` is a different concept, it cannot be
composed in terms of the other operations. It has a different semantic. It
changes the read_state status.
Did you mean write_state (if not, update documentation)? And yes, I missed
this state transition when making this suggestion.

The purpose of the state appears to be (and please correct me otherwise) is
to prevent consecutive 100 continue responses from being written. I don't
think its worth the extra state transition because the read calls do not
put the write_state into a special state that forces a 100 continue OR
error response. So a user of the library already has to know to check for
expect-100's in the request, at which point it seems reasonable to expect
the user to NOT write consecutive 100-continues.

My suggestion would be to either drop the state entirely (allowing the user
to "accidentally" write consecutive 100-continues) OR allow a write_state
change from a server header read into a expect-100-continue-response state.
The latter change may seem unpalatable, but this implementation is already
"locking" the write side of the channel during server reads to hide certain
error responses from users (bad http version, etc.).
...
- `async_read_request`, and `async_write_response_metadata` in the
...
http::ServerSocket concept require an entire http::Message concept as an
argument, when only the AssociativeContainer concept should be necessary
for this function.
- `async_write` and `async_read_some` in the http::Socket concept
requires an entire http::Message concept as an argument, when only the
SequenceContainer concept should be necessary for this function.
Just passing the message object is less error-prone (e.g. should I pass
headers() or trailers()?). But yes, I could do better (decrease
requirements and add a few helper functions/types).
...
because memory from a SequenceContainer concept cannot be given to
ASIO/OS
directly (std::list<std::uint8_t> is a SequenceContainer). The message
- All data reads / writes for the message body require a copy to be made,
body
...
should likely be restricted to the C++1z ContiguousContainer concept
instead. I think users would prefer speed here more than flexibility. Its
also easier to relax requirements after release than the opposite.
std::vector is used by default. Not sure how helpful it is to restrict the
user if he is willing to pay. A parser embedded in the message object would
have trouble implementing the ContiguousContainer concept when it comes to
chunked entities.
I agree that's easier to relax requirements after release than the
opposite.
A trait dispatched system could be provided if someone really wanted
SequenceContainer support. But its just too odd - std::list<std::uint8_t>
and other node based containers would be really low performing. Again, the
ASIO method is more flexible because it allows for contiguous chunks of
memory, or a range of contiguous chunks of memory to be provided. So a
std::list<std::uint8_t> should be supported by the ASIO interface actually
(oh my!), but obviously a range of 1-byte chunks is also likely poor
performing.
...
- Ideally the read / write functions for the payload would have the same
...
requirements as an asio buffer. Requiring a container prevents memory
locations from other sources (such as directly from a streambuf), and
prevents scatter gather i/o. I'd say this is less critical, as its
something most users are unlikely to use.
It may kill some use cases. If some type is more performant for the user,
it should be used. The default should satisfy most of the users wish.
What use cases does a container provide over an ASIO buffer? The only one I
can recall being mentioned is a wire & transport implementation that
sends/receives data as messages instead of a stream. I suspect the
flexibility of the ASIO buffer would appeal to more people than the ability
to read ZeroMQ messages with a slight efficiency advantage. I obviously
can't back this claim with any statistics, so I won't push much further on
this topic.

Are there other advantages or special use cases for the container-based
system? Have you considered standalone functions for writing/reading
standard HTTP/1.x chunked data directly from an ASIO stream? There might be
ways to allow effective usage of the native handle for advanced users (thus
keeping the default container implementation).
...
- `async_read_trailers` and `async_writer_trailers` in the http::Socket
...
concept require an entire http::Message concept as an argument, when only
the AssociativeContainer concept should be necessary.
Just passing the message object is less error-prone (e.g. should I pass
headers() or trailers()?). But yes, I could do better (decrease
requirements and add a few helper functions/types).
- I mentioned this before; client and server specific header
...
writing/reading is a shame due to the overwhelming similarities. My
earlier
suggestions were criticized for being stringly typed (or having more
states), but I think this was a somewhat weak criticism since the type
system can be used more effectively.
`async_write_header(http::response{200, "OK"}, ...)` or
`async_write_header(http::request{http::request::GET, "/"}, ...)`.
This design would have much more implicit behaviour and it'd lead to more
surprises. A ResponseBuilder or alike could be done, separately, in case
you don't mind the surprises.
What do you mean by implicit behavior? And what surprises? The name of the
type indicates the action being taken ... ?
...
- What is the purpose of the standalone `async_write_response` and
...
`async_write_response_metadata` functions? They only forward to functions
with the same name in the http::ServerSocket concept, is this cruft?
They fill the reason phrase for you based on the status code.
- The current `async_read_some` implementation has an error code path that
...
can initiate a write. This is not documented anywhere, and effectively
issuing an `async_read_some` command blocks both the send and receive
channels of the underlying ASIO stream. This problem is likely to occur
in
a few other spots, but I didn't check.
The ServerSocket concept gives guarantees about a producer to HTTP messages
(and you, as the user, writes the HTTP consumer). A channel to exchange
HTTP messages somehow. There were no guarantees about exposing the
implementation detail errors to the user.
Imagine a ZeroMQ-based ServerSocket that cannot send "pings" because it was
not documented in the general concept. The general ServerSocket concept
doesn't restrict the freedom from HTTP "providers" too much.
The general concept needs to stipulate whether reads or writes can be made
by the client after initiating an asynchronous operation. This is the most
important thing that must be documented, because the current implementation
cannot safely accept a write operation when some (all?) of the read
operations are still in progress. I think its acceptable for an
implementation to inject control messages under such restrictions, as long
as the user can still issue their own write when the specification allows
for it. Such implementations _may_ have to indicate the control message
behavior when documenting access to the "native handle" (and I suspect
"may" is going to be "always" in practice), in case a user tries to do some
raw writing. Or maybe the documentation just says "never do raw
read/writes" with the native handle.

Anyway, this is a serious documentation error within the library and I need
...
to fix it.
This isn't just a serious documentation error, its a design problem for
delivering lower-latency responses. Since the read call can issue an
automatic error message on the write channel, users of the library cannot
begin reading the next request (which is likely already in a kernel buffer
if the client is pipelining requests) until the prior response has finished
sending. If the errors were not automatically written by the
implementation, a user could began reading the next request (potentially
even preparing DB calls, etc.) while waiting for the response of the prior
request to complete.

I do not think this is a fatal issue, but it does make me question the
low-level design goals. The design/implementation seems to be halfway
between a constricting easy-to-use high-level design and a performant
low-level design. It seems reasonable to expect users of the low-level
interface to have more understanding of HTTP, while the high-level design
provides a better "safety net" for users. The low-level users would still
benefit from an HTTP system that reads/writes headers into/from an
associative container, handles receiving/sending chunked data
automatically, etc., which is convenient.

Lee