Re: [boost] This AFIO review (was: Re: [afio] AFIO review postponed till Monday)

23 Aug 2015

      On 23 Aug 2015 at 2:53, Rodrigo Madera wrote:
...
...
Is it lower performance than alternatives? Almost certainly, and the
documentation takes pains to present AFIO in a worst possible light
in any comparative benchmarks presented so nobody is under any
illusions. What matters is correctness, reliability, reusability and
portability - and most importantly of all - do NOT lose other
people's data.
Sorry, but I have to disagree tremendously.
Performance is the only justification I can give clients (and bosses) of
having such complex code as asynchronous programming requires. Justifying
Boost.ASIO is an exercise in marketing, and the benchmarks are the killer
chart that sell complexity. If not by that, Qt would be the chosen path in
many non-academic projects.
I even see a use case for AFIO in one of my products, where extreme write
performance with minimum IO is a first priority, and a big effort was made
to implement a subsystem for fast writes under the specific environment.
Effort so big that the project froze until the writer code was done.
We have to be very careful of terminology here, because what I meant 
by performance is not what you meant judging from what you just 
wrote.

I assumed Glen was referring to "peak performance" and on that AFIO 
will never be as fast as writing custom code directly using the OS 
APIs. That is what I was referring to.

Your statement here though suggests you meant "sustainable 
performance", so writing at the maximum sustained rate of your 
hardware. That is a very different situation, and I think you will 
find AFIO is very close to bare metal on that.

The reason why comes down to relative overheads. If you loop reading 
one byte from the same location, then AFIO looks very slow compared 
to the host OS because the overhead of ASIO and the continuations is 
large compared to reading a single byte from a kernel page cache. If 
however you are working with a cold cache scenario where there is any 
wait on storage at all, the relative overhead of AFIO to storage is 
miniscule.

The v1.3 engine has a latency of about 15 microseconds +/- 0.15 
microseconds at a 95% confidence interval. You should be good to go 
on any magnetic or SATA based SSD and then some. You may find battery 
backed RAM drives won't be maxed out with AFIO, but as I mentioned 
I'm working on it - low hanging fruit first: the number of people 
using battery backed RAM drives is few, while the number of people on 
SATA SSDs is many.
...
The point of this is that performance should not be low priority. Specially
for a boost library, and even more so for an asynchronous library. Citing
correctness is not a feature to me. It's just basic principle. You are
assumed to have it.
The importance of correctness is deeply underestimated, particularly 
by Linux which historically has had incorrect file system semantics 
and it is only in very recent years has there been a change in 
mentality about that. ext4 remains broken, XFS however has added 
extra internal locking to implement correctness.

In other words, you can't assume you always have correctness. FreeBSD 
is "slower" than Linux for peak performance, but is far faster than 
Linux in worst case performance. FreeBSD also has perfect 
correctness. Microsoft Windows also does very well, and is also 
correct.
...
As soon as AFIO is really competitive for me to use, I wish to do a full
review. As for the general API usage I'm not sure that a full blown review
is needed for that. The library is not finished, and you said, and a review
now will be just like our past review, where a very interesting library is
just not ready yet. And in your case, it doesn't yet perform.
I would be very surprised if anyone finds a performance problem 
outside synthetic benchmarks.
...
Reviewing of incomplete library work is not ideal, IMHO.
That being said, I have some questions:
Benchmarks,
Do you have usable coroutines examples now? The web sample uses futures.
The answer is no.

FYI C++ 1z coroutines are implemented using futures by default. The 
only toolchain currently implementing C++ 1z coroutines is VS2015 
with an extra compiler flag. And you the library end user needs to 
annotate your code with the "await" keyword to switch on 
coroutinisation. AFIO as a library doesn't have to do a thing except 
mark up its synchronisation types with coroutinisation metadata.
...
Do you have better performance numbers when using them?
I would expect performance to be lower. Stackful coroutines are not 
free.
...
What good measures did you employ to prevent caches from contaminating
benchmarks?
Almost all the benchmarks refer to warm cache scenarios to paint AFIO 
in the worst possible light relative to alternatives. I do have a 
cold cache benchmark in the find regex in files tutorial, and as is 
demonstrated you are aiming for the best tradeoff between cold cache 
and warm cache performance. You never get best performance in either 
extreme scenario, it's always a balanced tradeoff.
...
Do you believe that performance will improve?
I know performance will improve in synthetic warm cache benchmarks. I 
doubt any real world benchmarks would see a statistically measurable 
difference.
...
Do you know what your bottleneck is?
For the v1.3 engine it's overwhelmingly the ASIO reactor and the 
hoops AFIO has to jump through to work with it.
...
Why do other libraries do better? Can you imitate that while still leaving
your superior API?
AFIO's API is a set of design tradeoffs between bare metal 
performance and portability and correctness. libuv, probably its 
nearest alternative, is a different set of design tradeoffs.

To decide which to choose you need to decide what "better" means and 
what it is for your particular use case. I cannot give a generalised 
answer. It depends on what your priorities are.
...
About coroutines,
Do you really need C++1z?
The v1.4 API was built around Gor's coroutines (the C++ 1z design) 
and Oliver's forthcoming Boost.Fiber. Boost.Fiber currently only 
needs C++ 14.
...
Why not Boost.Coroutine emulation for backwards support?
I felt any additional real world performance gain wasn't worth it for 
the significantly more brittle usage. File i/o is many orders of 
magntitude slower than socket i/o. It genuinely is not important to 
spend an extra 10,000 CPU cycles per 1m CPU cycle i/o operation if it 
makes it easier to maintain.
...
Why do you use C++11 at all? C++03 is still widespread and a requirement
for most projects still.
AFIO was designed to take advantage of C++ 11 from the very beginning 
of its 
life. The single biggest assumption in its design is rvalue ref 
semantics, without which the library is unusably slow due to enormous 
amounts of memory copying. Lightweight futures make heavy use of C++ 
11 constexpr and noexcept. APIBind does not exist without template 
aliasing and inline namespaces.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/

Re: [boost] This AFIO review (was: Re: [afio] AFIO review postponed till Monday)

Niall Douglas