Re: [boost] [http] Formal Review

14 Aug 2015

      On Thu, Aug 13, 2015 at 11:16 PM, Vinícius dos Santos Oliveira <
vini.ipsmaker@gmail.com> wrote:
...
2015-08-13 22:19 GMT-03:00 Lee Clagett <forum@leeclagett.com>:
[snip]
...
...
...
This library will grow and also include a client. The HTTP parser is
the
least useful thing this library could provide. I'm aiming a library
that
is
so useful that you'd use it no matter if you're going to write an
application to be used on Raspberry Pi (or even more constrained) or
on a
cluster of nodes that distribute workload among thousands and thousands
of
nodes and use different intermediate protocols.
Definining a HTTP parser (specifically a SAX-style parser) is likely the
only way to achieve zero allocations, which is exactly what the parser in
use by this library is doing. Providing C++ templated SAX parser _would_
be
novel - it would allow for inlined/direct callbacks as opposed to the
indirect callbacks provided by the C parser in use. Probably a small
reduction in CPU cycles though, since parsing time should dominate.
Providing such a library would likely change the current design
dramatically.
No. That's a way to avoid memory copies. That's not necessary to avoid zero
allocations.
You can have a custom backend tied to a single type of message that will do
the HTTP parsing for you. It could be inefficient by maybe forcing the
parsing every time you call headers(), but without an implementation I
don't have much to say.
But this would only be able to handle a one HTTP message type? Or would
drop some useful information? I think it would be difficult to implement a
non-allocating HTTP parser unless it was SAX, or stopped at defined points
(essentially notifying you like a SAX parser).
...
...
...
shouldn't be exposed to the user anyway. The user is interested in
features
(can I use chunking? can I use X?). It's more portable. You'd suggest a
different approach if you had invested time to propose a library that
could
be used with different backends.
About your http::view proposal: It's not much different from
http::Message
concept, but:
- Your http::view read headers and read_state. This means you're
   mixing/coupling communication channels and HTTP messages. Bad for
different
   backends. It's bad because you then need to answer questions such as
"how
   do I pass a pool, custom alocator and this and that to my backend?".
   - Also, how does your proposal handle progressive download and
chunking?
   v.body() doesn't seem very/any asynchronous at all. The flush method
in
   response is useful for chunking, but the lack of information around
You even assume HTTP version is present. It may not even make sense and
it
the
...
capability of chunking by the communication channel leaves no option
but to
   buffer implicitly, which can exhaust memory in video live stream.
And
this
   magic node-like API can very easily hamper interoperability among
different
   implementations (I've already went into this road[1].
About the no-buffering approach, I just cannot give this guarantee. Any
object fulfilling the ServerSocket concept will have its own guarantees
and
it is up to the user to check them. At the handling level, you can use
custom types fulfilling the message concept that will avoid memory
allocation. This possibility is not accidental. Just like you're
worried,
I'm worried too.
How could a user provide a message concept that satifies the
SequenceContainer for the body avoid memory allocations? The
documentation
for reading the body doesn't state how the SequenceContainer is being
used
or modified. Can I set a size for the container, that the http::Socket
concept never exceeds while reading (effectively making it a fixed buffer
like ASIO)? I've read this several times (and maybe I'm being thick), but
if I were writing an implementation of this concept, it would seem that
its
implementation defined how the SequenceContainer is being manipulated.
Currently the body is being automatically resized on reads, with new data
being copied in. I _think_ the only way to control how much is being
copied
in at a time (and thus the max the SequenceContainer is being resized),
is
through the buffer argument to basic_socket, but this is undocumented.
And to satifies the message requirements for the header/trailers requires
an AssociateMap<string, string>. Currently the key/value types are
required
to meet the string concept (which I do not recall being defined by C++ or
boost), but assuming it meant an stl compatible basic_string, the best
way
to avoid memory allocation is a small string optimization and combine
with
flat_map or an intrusive map. This would reduce, but not eliminate
allocations. I'm slightly less concerned about this, I think its a good
tradeoff, but I'm not sure how acceptable these loose constraints are for
embedded situations that are continually mentioned. Any embedded
developers
following this?
Like you guessed, you pass a buffer to basic_socket. It won't read more
bytes than the buffer size.
But how can this be combined with higher order functions? For example a
`async_read_response(Socket, MaxDataSize, void(uint16_t, std::string,
vector<uint8_t>, error_code))`? However such a utility is defined, it will
have to be tied to a specific implementation currently, because theres no
way to control the max-read size via socket concept. Or would such a
function omit a max read size (several other libraries don't have one
either)? Or would it just overread a _bit_ into the container?
...
Fair enough. I didn't document this behaviour. I'll gather all these points
after the review period to implement these improvements.

...
About the embedded device situation, it'll be improved when I expose the
parser options, then you'll be able to set max headers size, max header
name size, mas header value size and so on. With all upper limits figured
out, you can provide a single chunk of memory for all data.
But what if an implementation wanted to discard some fields to really keep
the memory low? I think that was the point of the OP. I think this is
difficult to achieve with a notifying parser. It might be overkill for
Boost.Http, people under this durress can seek out existing Http parsers.

[snip]
...
...
...
...
reading? In-depth study?
A few hours digging into docs, about 3 hours looking into the
7. How much effort did you put into your evaluation? A glance? A quick
sources,
...
...
...
about 3 hours formulising the issues and recommending solutions.
Please read the design choices chapter. You seem to be completely
ignoring
all use cases where a standalone server (e.g. FastCGI, ZeroMQ...) isn't
used.
I read the design choices section, and I don't understand the
relevance.
Could you elaborate?
A simple use case: You're not directly exposing your application to the
network. You're using proxies with auto load balancing. You're not using
HTTP wire protocol to glue communication among the internal nodes. You're
using ZeroMQ. There is no HTTP wire format usage in your application at all
and Boost.Http still would be used, with the current API, as is. A
different HTTP backend would be provided and you're done.
It makes no sense to enforce the use of HTTP wire format in this use case
...
at all. And if you're an advocate of HTTP wire format, keep in mind that
the format changed in HTTP 2.0. There is no definitive set-in-stone
serialized representation.
Yes, if HTTP were converted into a different (more efficient) wire format
(as I've seen done in various ways - sandstorm/capnproto now does this
too), a new implementation of http::ServerSocket could re-read that format
and be compatible. It would be useful to state this more clearly in the
documentation, unless I missed it (sorry).

Lee