My Beast Review

9 Jul 2017

      Dear All,

This review is not very thorough - I've only looked at the docs and 
some small parts of the code, and read the discussion on the list.  I've 
not tried to use the library.  But I think I've seen enough to express an 
opinion.

I've never used ASIO.  I've studied it in the past, and concluded that 
it offers portability to platforms that I don't use but doesn't work 
well with other platforms that I do use.  It also seems more complicated 
than I would have liked, perhaps in order to achieve some optimised 
level of performance that I don't need (or perhaps for no good reason). 
I have however quite often used HTTP (hasn't everyone?) and I think it's 
an embarrassment that std::c++ can't download something from the 'net 
with one or two lines of code in the way that almost every other language 
can.  So, my first reaction is that it's a disappointment to see that a 
proposed HTTP library is tied to ASIO.

On further investigation, however, I think this is no great loss to me 
as the functionality that this library provides is so limited in scope.  
Parsing HTTP headers, for example, is not much work.  Even my own 
hacked-together-in-an-afternoon HTTP classes do more than this - to 
mention just one thing, parsing and formatting of dates in the correct 
format.  Everyone who uses this library will have to implement this sort 
of thing themselves on top of it.  (And many of them will implement it 
wrongly.)

Vinnie says that he hopes that more functionality will be layered on 
top of this work.  The danger is that the person who comes along wanting 
to implement the other 75% will not like Vinnie's starting point, and 
re-implement it all in their own style.  (I know I would.)

Unless, of course, Vinnie is going to offer this "other 75%" himself 
eventually.  *That would be great*.  If that is the plan, my review is 
"this is a fine start but please remove the ASIO dependency and come 
back when it's finished".

But I think Vinnie has said that the library is to be reviewed "as is". 
In that case, my review is "just add this to ASIO and call it boost.asio.http 
and boost.asio.websocket, no need for a review" (assuming that the ASIO 
developer is happy with that).

Regarding what I have seen of the implementation:

- Security must be one of the most important properties of an HTTP 
library, and I'd really hope to find that such a library had been 
implemented by someone with significant experience in this area.  
I never cease to be amazed by the ingenuity of attackers.

- I fear that there may be much premature optimisation in the parser 
code that I've looked at.  For example:

    static
    bool
    is_digit(char c)
    {
        return static_cast<unsigned char>(c-'0') < 10;
    }

Is that right?  Say c='!' (ASCII 33).  '0' is ASCII 48.  So c-'0' is -15. 
But you cast that to an unsigned char, so -15 = 241, and 241 > 10, so the 
function returns false.  So it is correct - but (1) it needs a comment to 
explain it, (2) it's the sort of thing that makes me worry from a security 
viewpoint, and (3) is it actually an improvement on the more obvious 
c >= '0' && c <= '9'?  Well I've done a quick test and I get the same assembler 
for both - so why bother?

(Except on close inspection actually I don't get exactly the same assembler.  
To get the same assembler I have to replace the constant 10 in your code with 
10U, which changes the final comparison from signed to unsigned.  I don't think 
that matters, but it again makes me wonder whether this security-critical code 
is ideal.)

But hey, why even do c >= '0' && c <= '9' when we have std::isdigit()?  But 
it would be much slower, right?  Wrong, it's actually one instruction shorter; 
somehow it manages to avoid one of the zero-extension instructions.  So this is 
not a premature optimisation, it's a premature pesimisation.  (My test code is 
at the end of this message.)

Now, is_digit is just about the simplest thing in the parser; Vinnie has 
also implemented his own code for parsing numbers, and then we come to 
parse_field.  Please have a look at this (around line 666 of basic_parser.hpp). 
Note that it calls find_fast() near the top of that file, which uses some 
Intel SIMD instruction to, I think, skip forward over the header field name, 
and then over the field value to the CRLF.  Not that there are any comments in 
find_fast to explain what it does, for those of us who don't know what 
_mm_loadu_si128 and _mm_cmpestri do.  Has this code actually been 
benchmarked against a much simpler and more obviously-correct version, and 
found to have a worthwhile performance benefit?

(I believe that some of the discussion has used this "optimised parser" as 
a justification for not making the code more generic, i.e. to work with 
any iterator pairs.  I believe that to be flawed, since (a) I don't think 
this optimisation is useful, and (b) even if it were, the code could 
use it for any contiguous iterators, and fall back to "unoptimised" code 
in other cases.  I think that having the "unoptimised" code present would 
be useful for documenting what the parser is trying to do, in any case.)

Could it be that the reason why the scope of the library seems so limited 
is that Vinnie has spent his time fiddling with this optimisation, rather 
than adding useful features?

My opinion:  having looked at the implementation, reject.

Test for is_digit:

#include <cctype>

bool is_digit_1(char c)
{
  return static_cast<unsigned char>(c-'0') < 10U;
}

bool is_digit_2(char c)
{
  return c >= '0' && c <= '9';
}

bool is_digit_3(char c)
{
  return std::isdigit(c);
}

Aarch64 assembler output from g++ 6.3.0 -O3, with the uninteresting bits 
removed:

_Z10is_digit_1c:
	uxtb	w0, w0
	sub	w0, w0, #48
	uxtb	w0, w0
	cmp	w0, 9
	cset	w0, ls
	ret

_Z10is_digit_2c:
	uxtb	w0, w0
	sub	w0, w0, #48
	uxtb	w0, w0
	cmp	w0, 9
	cset	w0, ls
	ret

_Z10is_digit_3c:
	uxtb	w0, w0
	sub	w0, w0, #48
	cmp	w0, 9
	cset	w0, ls
	ret

Regards, Phil.

Phil Endecott

Vinnie Falco

Seth

Mike Gresens

Andrey Semashev

Vinnie Falco

Nik Bougalis

Phil Endecott

tags

participants (6)