Re: [boost] [afio] Formal review of Boost.AFIO

27 Aug 2015

...
...
My main beef with the design you
chose is that it implies shared ownership where it seems to not be
necessary.
Let us assume that file descriptors must be reference counted in user
space as they are a precious resource, and as I mentioned you need to
internally tie lifetimes of internal workaround objects to things. So
let's assume the need for std::shared_ptr<handle> has been agreed and
move onto the meat on the bone which is the choice of shared future
semantics over unique future semantics for the
std::shared_ptr<handle> transported by afio::future.
Why isn't your 'handle' reference counted on its own, then? The user does
not even have to know/see if that was the case. This also would get rid of
this handle to an handle to and handle design which is so confusing and
probably creates a lot of inefficiency. A simple boost::future<handle> as
return values would cut the bill (note, I said boost::future as it already
implements all of the Concurrency TS). That would also eliminate the whole
monad/afio::future mess you're in. Those do not belong in AFIO.
...
First things first: shared future vs unique future isn't helpful
terminology. For AFIO both have equal overhead. What we're really
talking about is whether future.get() can be called exactly once
(future) or many times (shared future). From now on, I will therefore
talk in terms of consuming futures (future) vs non-consuming futures
(shared future).
First, the problem lies not in whether the implementations are equivalent,
but in the fact that shared_future and (unique_)future convey a design
choice (FWIW, future and shared_future in HPX share their implementation as
well, mostly). From this, the terminology is not only helpful, but
absolutely essential for good design.

Second, the possibility to call .get() more than once on a shared_future is
a _consequence_ of the design, not the _rationale_ for it.

Third, the terminology 'consuming' and 'non-consuming' future does not make
any sense (to me). More importantly it is by no means Standard-terminology -
thus reasoning in terms of those makes it much more difficult for us to talk
about the same thing.
...
I've been trying to think of an API for asynchronous filesystem and
file i/o which (a) uses consuming future instead of non-consuming
future and (b) is invariant to the concurrency engine under the
bonnet so end user code doesn't need to be made customised for the
concurrency runtime.
If you're interested in an concurrency-safe API, then returning
(unique_)future would be the proper choice. That conveys unique ownership
(i.e. no concurrency problems are possible) and makes it explicit that if
the user decides to assign it to a shared_future, then he/she is responsible
for dealing with concurrency issues.
...
Ok, let's revisit the original pattern code I mentioned:
EXAMPLE A:
shared_future h=async_file("niall.txt");
// Perform these in any order the OS thinks best
for(size_t n=0; n<100; n++)
  async_read(h, buffer[n], 1, n*4096);
This expands into these continuations:
EXAMPLE B:
shared_future h=async_file("niall.txt");
// Call these continuations when h becomes ready
for(size_t n=0; n<100; n++)
  // Each of these initiates an async read, so queue depth = 100
  h.then(detail::async_read(buffer[n], 1, n*4096));
Here we explicitly say we don't care about ordering for the
async_reads with respect to one another and the OS can choose any
ordering it thinks best, but we do insist that no async_reads occur
before the async_file (which opens "niall.txt" for reading).
Ok, that's possible just fine if you return boost::future from async_file
(shared_future has a non-implicit conversion constructor)
...
Let's imagine that with value-consuming semantics:
EXAMPLE C:
future h=async_file("niall.txt");
// Call these continuations when h becomes ready
for(size_t n=0; n<100; n++)
  // Each of these initiates an async read, so queue depth = 100
  h.then(detail::async_read(buffer[n], 1, n*4096));
On the second h.then() executed i.e. for when n==1, you should see a
future_errc::no_state exception thrown because the first .then() is
considered to have consumed the future state. This is because
future.get() can be called exactly once.
I don't see a reason why anybody would do this. If you know (as a user) you
need shared ownership you just make it explicit by assigning the
boost::future to a boost::shared_future (as you showed in Example A).
...
So, to have it not throw an exception, you need to do this:
EXAMPLE D:
future h=async_file("niall.txt");
shared_future h2(h);
// Call these continuations when h becomes ready
for(size_t n=0; n<100; n++)
  // Each of these initiates an async read, so queue depth = 100
  h2.then(detail::async_read(buffer[n], 1, n*4096));
Sure, but Example A is just fine, as said.
...
Now we consume the consuming future into a non-consuming future, and
non consuming futures are not consumed by their continuations so you
can add as many as you like (in this case 100).
This makes my head spin! What is consuming what? Or do you mean
'consummate'? Can we stick to Standard-terminology, please? We have unique
ownership and shared ownership with regard to the value represented by the
future - that's all I know.
...
Where I was coming from in the presented AFIO API design was that (a)
you the programmer is always trying to give as much opportunity to
parallelise (i.e. no ordering) as possible to the OS, hence the
pattern of adding more than one continuation to a future is very
common. That would throw an exception if the programmer forgot to
first convert the consuming future into a non consuming future (b) I
can't see any way of adding type safety to enforce single consumption
of consuming futures such that Example C refuses to compile.
If the programmer 'forgot' to use a shared_future when shared ownership is
needed then he/she gets what he/she deserves.
...
However this discussion makes me wonder if I'm missing a trick. The
big value in non consuming futures is they force you to think about
better design. I'd like to not lose that if possible. So, perhaps
Example A could be rewritten as:
EXAMPLE E:
future h=async_file("niall.txt");
// Perform this after h
future r=async_read(h, buffer0, 1, 0);
// Perform these in any order the OS thinks best
// after r
for(size_t n=0; n<100; n++)
  parallel_async_read(r, buffer[n], 1, n*4096);
In other words, for the case where you issue multiple continuations
onto a consuming future, you must use an API with a parallel_*
prefix. Otherwise it will throw on the second continuation add.
There are probably a 100 ways to express explicit data dependencies in your
code. Why not stick to the one specified in the Concurrency TS? Why invent
something new - again?

A general note. My main issue with the way you have designed things in AFIO
is that you verbally strive for conformance, but then for no (or little)
reason decide to change some minor semantics here and there (maybe nobody
will notice) such that the result still appears to be conforming, but is
nevertheless different in subtle ways. That makes it _very_ difficult to
reason about your design as one never knows whether you really mean what you
say. Before going off to design your own, you may consider first fully
understanding the design and rationale of existing Standard-solutions. And
only if you really can't make it work build something on your own. But
never, really never, take a Standard-design and _change_ it (adding things
is fine) without giving it a new name!

Regards Hartmut
---------------
http://boost-spirit.com
http://stellar.cct.lsu.edu

Re: [boost] [afio] Formal review of Boost.AFIO

Hartmut Kaiser