Most users I should imagine would therefore build scatter-gather lists on the stack as they'll be thrown away immediately, indeed I usually feed it curly braced initializer_lists personally,
Thus imposing limitation on the size of the buffer sequence.
The kernel imposes very significant limits on the size of the buffer list anyway: some OSs as low as 16 scatter-gather buffers per i/o, and as low as 1024 scatter-gather buffers in flight across the entire OS. So when you initiate an async i/o, you may get a resource temporarily unavailable error for even a single buffer, let alone two. On top of that, even if the OS accepts more, the DMA hardware has a fixed size buffer list capacity. 64 is not uncommon, and that's after the kernel has split your virtual memory scatter-gather list into physical memory plus added its own scatter-gather headers. So 32 buffers is a very realistic limit, and 16 is the portable maximum. (AFIO v2 doesn't involve itself whatsoever with any of this, it sends what you ask for to the OS, and reports back whatever errors the OS does)
I think you are going to have to back up your claim that memory copying all incoming data is faster rather than being bad implementation techniques with discontinuous storage
I'll let Kazuho back it up for me since I use his ideas: https://github.com/h2o/picohttpparser
Here's the slide show explaining the techniques: https://www.slideshare.net/kazuho/h2o-20141103pptx
And here is an example of the optimizations possible with linear buffers, which I plan to directly incorporate into Beast in the near future: https://github.com/h2o/picohttpparser/blob/2a16b2365ba30b13c218d15ed99915763...
Ah, I see you're referring to SIMD. I thought you were claiming that linear buffer based parsers were significantly faster than forward only iterator based parsers. You solve that problem by doing all i/o in multiples of the SIMD length, and memcpy any tail partial SIMD length at the end of a partial i/o into the next buffer. This avoids memory copying, yet keeps SIMD.
Of course if you think you can do better I would love to see your working parser that operates on multiple discontiguous high quality ring buffered page aligned DMA friendly storage iterators so that I might compare the performance. The good news is that you can do so while leveraging Beast's message model to produce objects that people using Beast already understand. Except that you'll be producing them much, much faster (which is a win-win for everyone).
You're the person bringing the library before review, not me. If you have a severe algorithmic flaw in your implementation, reviewers would be right to reject your library. They did so with me for AFIO v1, so it was on me to start AFIO again from scratch. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/