Re: [boost] [afio] Formal review of Boost.AFIO

30 Aug 2015

      On 8/30/2015 1:01 PM, Niall Douglas wrote:
...
I appreciate that from your perspective, it's a question of good
design principles, and splashing shared_ptr all over the place is not
considered good design. For the record, I*agree*  where the overhead
of a shared_ptr*could*  be important - an*excellent*  example of that
case is std::future<T> which it is just plain stupid that those use
memory allocation at all, and I have a non memory allocating
implementation which proves it in Boost.Outcome. But for AFIO, where
the cost of a shared_ptr will always be utterly irrelevant compared
to the operation cost, this isn't an issue.
Let's get this memory allocation concern out of the way. One just can't 
have a conforming implementation of `std::future` that does not allocate 
memory. Assume that you could, by embedding the storage for the result 
(value-or-exception) inside either of the `future/promise`:

1) Allocator support: `std::future::share` transfer ownership of the 
(unique) future into a shared future, and thus necessarily requires 
allocation [see below]. This allocation ought to be done with the 
allocator/pmr supplied to the `std::promise` constructor. You then have 
a few options:

a) Keeping this allocator around so that `std::future::share` can use 
it, this is the standard conforming option. This means type-erasure in 
some way or another, which accounts to doing memory allocation when the 
size of the allocator is greater than the size of some hard-coded small 
buffer (properly aligned, etc).

b) You can ditch allocator support, which is an option to the majority 
of the population but the standard, and resort to `std::allocator`. You 
now have a problem, because you have no control over the allocation 
process, and thus you cannot mitigate the cost of it by using pools, 
stack buffers, etc. However `std::shared_future` usage should be rare, 
so this might not be that big of a deal.

c) You can try to change the standard so that it is `std::future::share` 
that takes an allocator, and guarantee no memory allocation anywhere 
else. This would be a reasonable approach under certain conditions.

The reason `std::shared_future` cannot make use of embedded storage, 
thus necessarily requiring allocation, has to do with lifetime and 
thread-safety. `std::shared_future::get` returns a reference to the 
resulting value, which is guaranteed to be valid for as long as there is 
at least one instance of `std::shared_future` around. If embedded 
storage were to be used, it would imply moving the location of the 
resulting value when the instance holding it goes away. This can happen 
in a separate thread, as `std::shared_future` and 
`std::shared_future::get` are thread-safe. All in all it would lead to 
the following scenario:

     std::shared_future<T> s = get_a_shared_future_somehow();
     T const& r = s.get();
     std::cout << r; // potentially UB, potentially race

2) Type requirements: The standard places very few restrictions on which 
types can be used with asynchronous results, those are (besides 
Destructible)
   - `std::future<T>::get` requires `T` be move constructible,
   - `std::promise<T>::set_value` requires `T` be 
copy/move-constructible (some individuals are considering proposing 
`emplace_value`, which would lift this restriction),
   - `std::shared_future<T>::get` requires nothing.

The use of embedded storage increases those restrictions:

a) `T` has to be move-constructible, which is fine today as it is 
already required implicitly by `std::promise`, `std::packaged_task`, 
etc. I'm only mentioning this as there's interesting in dropping this 
requirement, to increase consistency with regard to emplace construction 
support.

b) `T` has to be nothrow-move-constructible, as moving any of 
`std::promise`, `std::future`, `std::shared_future` is `noexcept`.

c) If synchronization is required when moving the result from one 
embedded-storage to the other, `T` has to be 
trivially-move-constructible, as executing user code could potentially 
lead to a deadlock. This might be tractable by using atomics, the atomic 
experts would know (I am not one of them). This could also be addressed 
by transactional memory, but this would only further increase the 
restrictions on types that could be used (although I am not a TM expert 
either).

So far we know for sure that a standard-conforming non-allocating 
`std::promise<T>/future<T>` pair can be implemented as long as:
- `T` is trivially-move-constructible
- The allocator used to construct the promise is `std::allocator`, or 
that it is a viable candidate for small buffer optimization.

Such an implementation would use embedded storage under those 
partly-runtime conditions, which is quite a restricted population but 
still promising as it covers the basic `std::future<int>` scenario. But 
as it usually happens, it is a tradeoff, as such an implementation would 
have to incur synchronization overhead every time either of the 
`std::future/promise` is moved for the case where the `std::future` is 
retrieved before the value is ready, which in my experience comprises 
the majority of the use cases.

But for completeness, let's analyze the possible scenarios. It always 
starts with a `std::promise`, which is the one responsible for creating 
the shared-state. Then either of these could happen:

I) The shared-state is made ready by providing the value-or-exception to 
the `std::promise`.

II) The `std::future` is retrieved from the `std::promise`.

In the case where (I) happens before (II), no extra synchronization is 
needed, since the `std::promise` can simply transfer the result to the 
`std::future` during (II). Once the result has been provided, there is 
no further communication between `std::promise` and `std::future`. This 
represents the following scenario:

     std::promise<int> p;
     p.set_value(42);
     std::future<int> f = p.get_future();

which is nothing but a long-winded overhead-riddled way of saying:

     std::future<int> f = std::make_ready_future(42);

In the case where (II) happens before (I), every time either one of the 
`std::future` or `std::promise` moves it has to notify the other one 
that it has gone to a different location, should it require to contact 
it. Again, this would happen for as long as the shared-state is not made 
ready, and represents the following scenario:

     std::promise<int> p;
     std::future<int> f = p.get_future();
     std::thread t([p = std::move(p)] { p.set_value(42); });

which is the poster-child example of using `std::promise/future`.

Finally, in Lenexa the SG1 decided to accept as a defect LWG2412, which 
allows for (I) and (II) to happen concurrently (previously undefined 
behavior). This appears to not have yet moved forward by LWG yet. It 
represents the following scenario:

     std::promise<int> p;
     std::thread t([&] { p.set_value(42); });
     std::future<int> f = p.get_future();

which is in reality no different than the previous scenario, but which 
an embedded storage `std::promise` implementation needs to address with 
more synchronization.

Why is this synchronization worth mention at all? Because it hurts 
concurrency. Unless you are in complete control of every piece of code 
that touches them and devise it so that no moves happen, you are going 
to see the effects of threads accessing memory of other threads with all 
what it implies. But today's `std::future` and `std::promise` are 
assumed to be cheaply movable (just a pointer swap). You could try to 
protect from it by making `std::future` and `std::promise` as long as a 
cache line, and even by simply using dynamic memory allocation for them 
together with an appropriate allocator specifically designed to aid 
whatever use case you could have where allocation time is a constraint.

And finally, let's not forget that the Concurrency TS (or actually the 
futures continuation section of it) complicates matters even more. The 
addition of `.then` requires implementations to store an arbitrary 
Callable around until the future to which it was attached becomes ready. 
Arguably, this Callable has to be stored regardless of whether the 
future is already ready, but I'm checking the final wording and it 
appears that you can as-if run the continuation in the calling thread 
despite not being required (and at least discouraged in an initial 
phase). Similar to the earlier allocator case, this Callable can be 
whatever so it involves type-erasure in some way or another, which will 
require memory allocation whenever it doesn't fit within a dedicated 
small buffer object.

To sum things up (based on my experience and that of others which I had 
a chance to discuss the subject), a non-allocating quasi-conformant 
`std::future/promise` implementation would cater only to a very limited 
set of types in highly constrained scenarios where synchronization 
overhead is not a concern. In real word scenarios, and specially those 
that rely heavily on futures due to the use of continuations, time is 
better spent by focusing in memory allocation schemes (the actual real 
concern after all) by using the standard mechanism devised to tend to 
exactly those needs: allocators.

I'll be interested in hearing your findings during your work on the 
subject. And would you want me to have a look at your implementation and 
come up with ways to "break it" (which is what I do best), you have just 
to contact me.

Regards,
-- 
Agustín K-ballo Bergé.-
http://talesofcpp.fusionfenix.com