On 8/30/2015 1:01 PM, Niall Douglas wrote:
I appreciate that from your perspective, it's a question of good design principles, and splashing shared_ptr all over the place is not considered good design. For the record, I*agree* where the overhead of a shared_ptr*could* be important - an*excellent* example of that case is std::future<T> which it is just plain stupid that those use memory allocation at all, and I have a non memory allocating implementation which proves it in Boost.Outcome. But for AFIO, where the cost of a shared_ptr will always be utterly irrelevant compared to the operation cost, this isn't an issue.
Let's get this memory allocation concern out of the way. One just can't have a conforming implementation of `std::future` that does not allocate memory. Assume that you could, by embedding the storage for the result (value-or-exception) inside either of the `future/promise`: 1) Allocator support: `std::future::share` transfer ownership of the (unique) future into a shared future, and thus necessarily requires allocation [see below]. This allocation ought to be done with the allocator/pmr supplied to the `std::promise` constructor. You then have a few options: a) Keeping this allocator around so that `std::future::share` can use it, this is the standard conforming option. This means type-erasure in some way or another, which accounts to doing memory allocation when the size of the allocator is greater than the size of some hard-coded small buffer (properly aligned, etc). b) You can ditch allocator support, which is an option to the majority of the population but the standard, and resort to `std::allocator`. You now have a problem, because you have no control over the allocation process, and thus you cannot mitigate the cost of it by using pools, stack buffers, etc. However `std::shared_future` usage should be rare, so this might not be that big of a deal. c) You can try to change the standard so that it is `std::future::share` that takes an allocator, and guarantee no memory allocation anywhere else. This would be a reasonable approach under certain conditions. The reason `std::shared_future` cannot make use of embedded storage, thus necessarily requiring allocation, has to do with lifetime and thread-safety. `std::shared_future::get` returns a reference to the resulting value, which is guaranteed to be valid for as long as there is at least one instance of `std::shared_future` around. If embedded storage were to be used, it would imply moving the location of the resulting value when the instance holding it goes away. This can happen in a separate thread, as `std::shared_future` and `std::shared_future::get` are thread-safe. All in all it would lead to the following scenario: std::shared_future<T> s = get_a_shared_future_somehow(); T const& r = s.get(); std::cout << r; // potentially UB, potentially race 2) Type requirements: The standard places very few restrictions on which types can be used with asynchronous results, those are (besides Destructible) - `std::future<T>::get` requires `T` be move constructible, - `std::promise<T>::set_value` requires `T` be copy/move-constructible (some individuals are considering proposing `emplace_value`, which would lift this restriction), - `std::shared_future<T>::get` requires nothing. The use of embedded storage increases those restrictions: a) `T` has to be move-constructible, which is fine today as it is already required implicitly by `std::promise`, `std::packaged_task`, etc. I'm only mentioning this as there's interesting in dropping this requirement, to increase consistency with regard to emplace construction support. b) `T` has to be nothrow-move-constructible, as moving any of `std::promise`, `std::future`, `std::shared_future` is `noexcept`. c) If synchronization is required when moving the result from one embedded-storage to the other, `T` has to be trivially-move-constructible, as executing user code could potentially lead to a deadlock. This might be tractable by using atomics, the atomic experts would know (I am not one of them). This could also be addressed by transactional memory, but this would only further increase the restrictions on types that could be used (although I am not a TM expert either). So far we know for sure that a standard-conforming non-allocating `std::promise<T>/future<T>` pair can be implemented as long as: - `T` is trivially-move-constructible - The allocator used to construct the promise is `std::allocator`, or that it is a viable candidate for small buffer optimization. Such an implementation would use embedded storage under those partly-runtime conditions, which is quite a restricted population but still promising as it covers the basic `std::future<int>` scenario. But as it usually happens, it is a tradeoff, as such an implementation would have to incur synchronization overhead every time either of the `std::future/promise` is moved for the case where the `std::future` is retrieved before the value is ready, which in my experience comprises the majority of the use cases. But for completeness, let's analyze the possible scenarios. It always starts with a `std::promise`, which is the one responsible for creating the shared-state. Then either of these could happen: I) The shared-state is made ready by providing the value-or-exception to the `std::promise`. II) The `std::future` is retrieved from the `std::promise`. In the case where (I) happens before (II), no extra synchronization is needed, since the `std::promise` can simply transfer the result to the `std::future` during (II). Once the result has been provided, there is no further communication between `std::promise` and `std::future`. This represents the following scenario: std::promise<int> p; p.set_value(42); std::future<int> f = p.get_future(); which is nothing but a long-winded overhead-riddled way of saying: std::future<int> f = std::make_ready_future(42); In the case where (II) happens before (I), every time either one of the `std::future` or `std::promise` moves it has to notify the other one that it has gone to a different location, should it require to contact it. Again, this would happen for as long as the shared-state is not made ready, and represents the following scenario: std::promise<int> p; std::future<int> f = p.get_future(); std::thread t([p = std::move(p)] { p.set_value(42); }); which is the poster-child example of using `std::promise/future`. Finally, in Lenexa the SG1 decided to accept as a defect LWG2412, which allows for (I) and (II) to happen concurrently (previously undefined behavior). This appears to not have yet moved forward by LWG yet. It represents the following scenario: std::promise<int> p; std::thread t([&] { p.set_value(42); }); std::future<int> f = p.get_future(); which is in reality no different than the previous scenario, but which an embedded storage `std::promise` implementation needs to address with more synchronization. Why is this synchronization worth mention at all? Because it hurts concurrency. Unless you are in complete control of every piece of code that touches them and devise it so that no moves happen, you are going to see the effects of threads accessing memory of other threads with all what it implies. But today's `std::future` and `std::promise` are assumed to be cheaply movable (just a pointer swap). You could try to protect from it by making `std::future` and `std::promise` as long as a cache line, and even by simply using dynamic memory allocation for them together with an appropriate allocator specifically designed to aid whatever use case you could have where allocation time is a constraint. And finally, let's not forget that the Concurrency TS (or actually the futures continuation section of it) complicates matters even more. The addition of `.then` requires implementations to store an arbitrary Callable around until the future to which it was attached becomes ready. Arguably, this Callable has to be stored regardless of whether the future is already ready, but I'm checking the final wording and it appears that you can as-if run the continuation in the calling thread despite not being required (and at least discouraged in an initial phase). Similar to the earlier allocator case, this Callable can be whatever so it involves type-erasure in some way or another, which will require memory allocation whenever it doesn't fit within a dedicated small buffer object. To sum things up (based on my experience and that of others which I had a chance to discuss the subject), a non-allocating quasi-conformant `std::future/promise` implementation would cater only to a very limited set of types in highly constrained scenarios where synchronization overhead is not a concern. In real word scenarios, and specially those that rely heavily on futures due to the use of continuations, time is better spent by focusing in memory allocation schemes (the actual real concern after all) by using the standard mechanism devised to tend to exactly those needs: allocators. I'll be interested in hearing your findings during your work on the subject. And would you want me to have a look at your implementation and come up with ways to "break it" (which is what I do best), you have just to contact me. Regards, -- Agustín K-ballo Bergé.- http://talesofcpp.fusionfenix.com