On 23 Aug 2015 at 2:53, Rodrigo Madera wrote:
Is it lower performance than alternatives? Almost certainly, and the documentation takes pains to present AFIO in a worst possible light in any comparative benchmarks presented so nobody is under any illusions. What matters is correctness, reliability, reusability and portability - and most importantly of all - do NOT lose other people's data.
Sorry, but I have to disagree tremendously.
Performance is the only justification I can give clients (and bosses) of having such complex code as asynchronous programming requires. Justifying Boost.ASIO is an exercise in marketing, and the benchmarks are the killer chart that sell complexity. If not by that, Qt would be the chosen path in many non-academic projects.
I even see a use case for AFIO in one of my products, where extreme write performance with minimum IO is a first priority, and a big effort was made to implement a subsystem for fast writes under the specific environment. Effort so big that the project froze until the writer code was done.
We have to be very careful of terminology here, because what I meant by performance is not what you meant judging from what you just wrote. I assumed Glen was referring to "peak performance" and on that AFIO will never be as fast as writing custom code directly using the OS APIs. That is what I was referring to. Your statement here though suggests you meant "sustainable performance", so writing at the maximum sustained rate of your hardware. That is a very different situation, and I think you will find AFIO is very close to bare metal on that. The reason why comes down to relative overheads. If you loop reading one byte from the same location, then AFIO looks very slow compared to the host OS because the overhead of ASIO and the continuations is large compared to reading a single byte from a kernel page cache. If however you are working with a cold cache scenario where there is any wait on storage at all, the relative overhead of AFIO to storage is miniscule. The v1.3 engine has a latency of about 15 microseconds +/- 0.15 microseconds at a 95% confidence interval. You should be good to go on any magnetic or SATA based SSD and then some. You may find battery backed RAM drives won't be maxed out with AFIO, but as I mentioned I'm working on it - low hanging fruit first: the number of people using battery backed RAM drives is few, while the number of people on SATA SSDs is many.
The point of this is that performance should not be low priority. Specially for a boost library, and even more so for an asynchronous library. Citing correctness is not a feature to me. It's just basic principle. You are assumed to have it.
The importance of correctness is deeply underestimated, particularly by Linux which historically has had incorrect file system semantics and it is only in very recent years has there been a change in mentality about that. ext4 remains broken, XFS however has added extra internal locking to implement correctness. In other words, you can't assume you always have correctness. FreeBSD is "slower" than Linux for peak performance, but is far faster than Linux in worst case performance. FreeBSD also has perfect correctness. Microsoft Windows also does very well, and is also correct.
As soon as AFIO is really competitive for me to use, I wish to do a full review. As for the general API usage I'm not sure that a full blown review is needed for that. The library is not finished, and you said, and a review now will be just like our past review, where a very interesting library is just not ready yet. And in your case, it doesn't yet perform.
I would be very surprised if anyone finds a performance problem outside synthetic benchmarks.
Reviewing of incomplete library work is not ideal, IMHO.
That being said, I have some questions:
Benchmarks, Do you have usable coroutines examples now? The web sample uses futures.
The answer is no. FYI C++ 1z coroutines are implemented using futures by default. The only toolchain currently implementing C++ 1z coroutines is VS2015 with an extra compiler flag. And you the library end user needs to annotate your code with the "await" keyword to switch on coroutinisation. AFIO as a library doesn't have to do a thing except mark up its synchronisation types with coroutinisation metadata.
Do you have better performance numbers when using them?
I would expect performance to be lower. Stackful coroutines are not free.
What good measures did you employ to prevent caches from contaminating benchmarks?
Almost all the benchmarks refer to warm cache scenarios to paint AFIO in the worst possible light relative to alternatives. I do have a cold cache benchmark in the find regex in files tutorial, and as is demonstrated you are aiming for the best tradeoff between cold cache and warm cache performance. You never get best performance in either extreme scenario, it's always a balanced tradeoff.
Do you believe that performance will improve?
I know performance will improve in synthetic warm cache benchmarks. I doubt any real world benchmarks would see a statistically measurable difference.
Do you know what your bottleneck is?
For the v1.3 engine it's overwhelmingly the ASIO reactor and the hoops AFIO has to jump through to work with it.
Why do other libraries do better? Can you imitate that while still leaving your superior API?
AFIO's API is a set of design tradeoffs between bare metal performance and portability and correctness. libuv, probably its nearest alternative, is a different set of design tradeoffs. To decide which to choose you need to decide what "better" means and what it is for your particular use case. I cannot give a generalised answer. It depends on what your priorities are.
About coroutines, Do you really need C++1z?
The v1.4 API was built around Gor's coroutines (the C++ 1z design) and Oliver's forthcoming Boost.Fiber. Boost.Fiber currently only needs C++ 14.
Why not Boost.Coroutine emulation for backwards support?
I felt any additional real world performance gain wasn't worth it for the significantly more brittle usage. File i/o is many orders of magntitude slower than socket i/o. It genuinely is not important to spend an extra 10,000 CPU cycles per 1m CPU cycle i/o operation if it makes it easier to maintain.
Why do you use C++11 at all? C++03 is still widespread and a requirement for most projects still.
AFIO was designed to take advantage of C++ 11 from the very beginning of its life. The single biggest assumption in its design is rvalue ref semantics, without which the library is unusably slow due to enormous amounts of memory copying. Lightweight futures make heavy use of C++ 11 constexpr and noexcept. APIBind does not exist without template aliasing and inline namespaces. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/