Re: [boost] [Beast] Questions Before Review

26 Jun 2017

      ...
...
Most users I should imagine would therefore build
scatter-gather lists on the stack as they'll be thrown away immediately,
indeed I usually feed it curly braced initializer_lists personally,
Thus imposing limitation on the size of the buffer sequence.
The kernel imposes very significant limits on the size of the buffer
list anyway: some OSs as low as 16 scatter-gather buffers per i/o, and
as low as 1024 scatter-gather buffers in flight across the entire OS. So
when you initiate an async i/o, you may get a resource temporarily
unavailable error for even a single buffer, let alone two.

On top of that, even if the OS accepts more, the DMA hardware has a
fixed size buffer list capacity. 64 is not uncommon, and that's after
the kernel has split your virtual memory scatter-gather list into
physical memory plus added its own scatter-gather headers. So 32 buffers
is a very realistic limit, and 16 is the portable maximum.

(AFIO v2 doesn't involve itself whatsoever with any of this, it sends
what you ask for to the OS, and reports back whatever errors the OS does)
...
...
I think you are going to have to back up your claim that memory
copying all incoming data is faster rather than being bad implementation
techniques with discontinuous storage
I'll let Kazuho back it up for me since I use his ideas:
https://github.com/h2o/picohttpparser
Here's the slide show explaining the techniques:
https://www.slideshare.net/kazuho/h2o-20141103pptx
And here is an example of the optimizations possible with linear
buffers, which I plan to directly incorporate into Beast in the near
future:
https://github.com/h2o/picohttpparser/blob/2a16b2365ba30b13c218d15ed99915763...
Ah, I see you're referring to SIMD. I thought you were claiming that
linear buffer based parsers were significantly faster than forward only
iterator based parsers.

You solve that problem by doing all i/o in multiples of the SIMD length,
and memcpy any tail partial SIMD length at the end of a partial i/o into
the next buffer. This avoids memory copying, yet keeps SIMD.
...
Of course if you think you can do better I would love to see your
working parser that operates on multiple discontiguous high quality
ring buffered page aligned DMA friendly storage iterators so that I
might compare the performance. The good news is that you can do so
while leveraging Beast's message model to produce objects that people
using Beast already understand. Except that you'll be producing them
much, much faster (which is a win-win for everyone).
You're the person bringing the library before review, not me. If you
have a severe algorithmic flaw in your implementation, reviewers would
be right to reject your library. They did so with me for AFIO v1, so it
was on me to start AFIO again from scratch.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Re: [boost] [Beast] Questions Before Review

Niall Douglas