Re: [boost] [http] Formal Review

14 Aug 2015

      2015-08-13 22:29 GMT-03:00 Lee Clagett <forum@leeclagett.com>:
...
On Thu, Aug 13, 2015 at 11:43 AM, Vinícius dos Santos Oliveira <
vini.ipsmaker@gmail.com> wrote:
...
2015-08-12 19:27 GMT-03:00 Lee Clagett <forum@leeclagett.com>:
...
Anyway - I was thinking along the same lines at various points. Having
a
function that pre-generates HTTP messages is very useful IMO, and
should
likely be included. Designing a good parsing concept would be a bit
more
work I think, but probably worth it too. I'm not sure how the author
intends to swap out parsers in the current design. Having a fixed
parser
seems acceptable, but the author almost seemed to suggest that it could
be
selectable somehow.
A parser doesn't make sense for all communication channels.
Do you have an example of a communication channel where a parser concept
wouldn't work? They wouldn't necessarily always provide the same output or
behave the same way, but a communication channel has a defined format. Any
implementation reading that format generally has _some_ output, which is
pretty much a parser IMO. Sorry for the bikeshedding on this, its not
really necessary, but this stuck out for some reason.
Well, in my previous answer, I think I ended focusing on the wrong part of
the proposal. Let me fix this issue in this email. And thank you for
helping me figuring out my mistake (or "keep insisting" or "not losing hope
on me", what you prefer).

To handle a HTTP request, you read the metadata, progressively download the
body and then the trailers. It's wise to avoid reading partial metadata
because the request can only be handled after the whole metadata has been
read. However, the body can be handled as it is received.

You're arguing that you always (1) fill a buffer and then (2) parse it. CGI
uses environment variables, not a contiguous chunk of memory or stream of
bytes. It still could work, though. The headers would be serialized into
the buffer (not nice).

Using the buffer/view approach, messages are always in serialized format.
Unless you store/cache the parsing result (doing allocation or some fixed
size, as you do not know amounts ahead-of-time), this will consume more
time to handle, as you'll need to reparse every time an information is
asked ("give me header host", "give me header cookie"). If you do store
parsing result, you're just storing the message using a
masked-as-not-message-based-when-it's-not API. And unless your buffer is
used to store more info than the real network traffic, the view needs to be
get information from the socket too, not just the buffer. It's not a pure
parser, there is state not found on the buffer (imagine how you'd handle
progressive download where lots of the traffic was already discarded to
handle the rest of the message), so the view needs the socket, which is
storing information not present in the buffer. It's like wasting much more
CPU usage to avoid some more memory usage. These are just the basic changes
of impact.

On a high-level side of thinking, buffer/view approach makes all requests
immutable by default. You cannot fake or inject data in the headers while
you pass the headers along a chain of handlers. There are workarounds.
Also, if you cannot forge the HTTP message, you cannot create it and send
to the socket. You always need to use the generator. The problem with the
proposed generator is the lack of consideration for other HTTP backends.
More care need to be given to capabilities like happens in Boost.Http (is
chunking available? 100-continue? can I upgrade? ...?). Also, like I've
stated in one of the previous emails, it's tricky to get interoperability
right with these not explicit generators (is chunking going to be
implicitly used?).

I need to think more, maybe I'll send another email with more comments. You
can have these meanwhile.
...
Currently there is one communication channel: basic_socket<T>
...
basic_socket<T> is meant for embedded servers (will be extender for lean
HTTP client connections in the future) and is tied to the HTTP wire
format,
...
so it uses an HTTP parser. I don't expose the parser because I use a C
...
parser that is limited to the C (lack of) expressiveness. When I replace
Its possible to expose the C parser through a C++ interface, so I don't see
this as a valid argument.
Okay. The C parser I use has a terrible abstraction and I don't want to
Boostify this abstraction. I want to define a better interface, a new
interface, not a Boostification of the C interface I internally use. The
"bad" interface has a great excuse to be the way it is: it's one of the
best things you can have trying to abstracting code using the C language.

-- 
Vinícius dos Santos Oliveira
https://about.me/vinipsmaker