Re: [boost] This AFIO review - a modest proposal

24 Aug 2015

      On 24 Aug 2015 at 5:49, Glen Fernandes wrote:
...
Niall Douglas wrote:
...
Glen Fernandes wrote:
...
Isn't  "I don't understand the point of this library" a valid reason 
for rejection?
In itself, no I don't think so. There are at least ten libraries in Boost
I have no understanding of the point of
If almost nobody the understands point of a proposed library to the extent
that almost nobody recommend its acceptance: I would think that it isn't
unjustly rejected. I believe niche use case libraries have a place in Boost.
I suspect, though, that an asynchronous file I/O library falls into the
category of something that most people want in Boost.
I think that an asynchronous file i/o library in Boost is something 
that people *think* they want because they erroneously believe it 
will improve performance in a single swoop.

However the file system is one of the biggest performance pain points 
in a computer. It has been optimised *relentlessly* such that it's 
good enough almost all of the time for most people.

That's why a naïve asynchronous file i/o library like say ASIO's 
stream implementation design I believe is quite useless in the real 
world - you certainly wouldn't write a database with it because you 
gain nothing over using the host OS APIs directly. It provides no 
practical gain to anyone with real file system performance problems 
because apart from the async, it offers nothing else useful like 
read-write ordering guarantees.

Totally separate to the async file i/o is the race free filesystem 
stuff. I would like to believe that people understand why a race free 
filesystem API is important. There is consensus that you need an 
abstracted file handle object, and the race free filesystem API needs 
to hang off of that - indeed I remember Beman mentioning somewhere I 
once read the difficulty of standardising on an abstracted file 
handle object with respect to STL iostreams as being a big reason 
that the Filesystem TS does not attempt to address race free 
filesystem.

In AFIO I have proposed a race free filesystem API and an abstracted 
file handle object. I think that approach is uncontroversial. The 
decision to value add on top asynchronicity *is* controversial, but 
one could call that an internal implementation detail from the 
perspective of synchronous use because if you want to program race 
free filesystem synchronously with AFIO, there is a full suite of 
easy to use 100% synchronous APIs provided.

Should then race free filesystem be split off into a separate purely 
synchronous library away from async file and filesystem? If so, how 
do you design the i/o model, because you can't use STL iostreams.

This is a very good question, and it is why I am here for review 
before I start the engine rewrite as I need to get feedback on this 
now (BTW it isn't useful to say yes of course you should split off 
synchronous race free filesystem. It is useful to say how you solve 
the abstracted handle object problem - how should you read and write 
from the handle? Should POSIX read/write atomicity semantics be 
exposed? How should it integrate with STL iostreams and the 
Filesystem TS? If these were easy questions, Beman would have 
designed in a solution in the Filesystem TS already).
...
I haven't figured out entirely where the disconnect is. It seems like you're
saying "This is the async file I/O library that you need; not the async file
I/O library that you want."
You understand me perfectly. Fundamental design mistakes by me 
notwithstanding (see my response to Thomas' review).
...
You also say that only a tiny fraction of developers have those needs.
Are any of them going to be reviewing this library?
boost-dev isn't exactly full of people programming in this niche. I 
have colleagues in file system communities, indeed I am supposed to 
be writing a white paper on async byte range locking with none other 
than Jeff Layton except this review turned up after C++ Now, so I had 
to shelve the white paper until next year.

Filesystem specialists appear to get quite excited about AFIO, and as 
I mentioned my CppCon talk looks like it will be surprisingly well 
attended considering. The single biggest bone they pick is the 
requirement for C++ 11 as that is years away for most of them. File 
system code is exceptionally conservative, they won't trust C++ 11/14 
until at least 2018.
...
...
Can you explain what is not straightforward more precisely please?
Sure. With regards to the examples:
- Be more concise,
- Have less standard out statements,
- Have less comments,
- Have no conditional compilation
  * BOOST_AFIO_USE_LEGACY_FILESYSTEM_SEMANTICS? How can parts of AFIO be
legacy?
That is not caused by AFIO. Boost.Filesystem still doesn't match the 
Filesystem TS. The macro BOOST_AFIO_USE_LEGACY_FILESYSTEM_SEMANTICS 
has AFIO use workarounds specific to Boost.Filesystem. As soon as 
Boost.Filesystem gets fixed, I will be more than pleased to remove 
the workarounds.
...
* #if 0? In an example?
  * No platform specifics
I think I've either logged issues or explained myself about all of 
the above in other threads. Thanks though for the list.
...
The other advice I have is that you may want to omit a comment like "This
section was not finished in time for the beginning of the Boost peer review
due a hard drive failure induced by the testing of AFIO-based key-value
stores in this workshop tutorial (sigh!)" in the documentation. You don't
want prospective reviewers to wonder if they should back up their hard drive
before trying AFIO. :-)
That comment was purely for you guys to explain the missing final 
example. Which currently is over 1000 lines long, and growing - I 
only got the direct-from-mmap dense hash map working late last night. 
I personally think almost nobody here will be interested in studying 
the code - maybe Tony van Eerd. That's it.

Lock free filesystem programming is like lock free atomic programming 
- excellent progression guarantees and sometimes great performance. 
But the implementation code hurts the head, especially as you 
sometimes deliberately use races as part of your algorithm which is 
to my knowledge not common in atomic memory lock free programming.

So why end the key-value store tutorial with such a complex final 
design? Because that's the whole point of why people's assumptions 
that "async i/o makes your code quicker" is flawed.

*IF* you are willing to completely turn on its head your entire 
design, approach and methodology to file system programming, you can 
get *spectacular* results, as you will see when you compare the 
benchmarks for the one-file-per-key design to a more sophisticated 
design based on how file systems actually work, not how the average 
programmer thinks they work. And a library like AFIO makes doing that 
much, much easier than without.

If on the other hand you think sprinkling some async on top of your 
conventional file system approach and algorithms will make it go 
quicker, you are probably incorrect. That's the conceptual hill the 
tutorial tries to get people to climb. I suspect that is the cause of 
much of the disconnect you mentioned, and could well mean that AFIO 
will never be accepted into Boost as it's the wrong audience.

And I have no problem if AFIO never enters Boost. I would think it a 
shame and a wasted opportunity as it solves a ton of pain points for 
those with such needs, but I am not working on AFIO for the good of 
my health. I specifically need AFIO for a new kind of database 
product I have in mind from which I hope to retire and never have to 
work again. If it is accepted into Boost, then I'll press for it to 
be standardised into ISO at WG21. If it is not accepted, I'll try one 
more time and then I'll move on as I have better things to do. It's 
no problem either way.

Niall

-- 
ned Productions Limited Consulting
http://www.nedproductions.biz/ 
http://ie.linkedin.com/in/nialldouglas/