On 30 Aug 2015 at 15:05, Agustín K-ballo Bergé wrote:
On 8/30/2015 1:01 PM, Niall Douglas wrote:
I appreciate that from your perspective, it's a question of good design principles, and splashing shared_ptr all over the place is not considered good design. For the record, I*agree* where the overhead of a shared_ptr*could* be important - an*excellent* example of that case is std::future<T> which it is just plain stupid that those use memory allocation at all, and I have a non memory allocating implementation which proves it in Boost.Outcome. But for AFIO, where the cost of a shared_ptr will always be utterly irrelevant compared to the operation cost, this isn't an issue.
Let's get this memory allocation concern out of the way. One just can't have a conforming implementation of `std::future` that does not allocate memory. Assume that you could, by embedding the storage for the result (value-or-exception) inside either of the `future/promise`:
Firstly I just wanted to say this is a really comprehensive and well written summary of the issues involved. One wouldn't have thought future<T> to be such a large tapestry, but as you demonstrate very well it is. I'll just limit my comments to your text to what my Boost.Outcome library does if that's okay. I should stress before I begin that I would not expect my non-allocating futures to be a total replacement for STL futures, but rather a complement to them (they are in fact dependent on STL futures because they use them internally) which might be useful as a future quality-of-implementation optimisation if and only if certain constraints are satisified. My non-allocating futures are only useful these circumstances: 1. You are not using an allocator for the T in future<T>. 2. Your type T has either a move or copy constructor or both. 3. The cost of T's move (or copy) constructor is low. 4. Your type T is not the error_type (typically std::error_code) nor the exception_type (typically std::exception_ptr). 5. sizeof(T) is small. 6. If you want your futures to have noexcept move constructors and assignment, your T needs the same. 7. future.wait() very rarely blocks in your use scenario i.e. most if not nearly all the time the future is ready. If you are blocking, the cost of any thread sleep will always dwarf the cost of any future. These circumstances are common enough in low latency applications such as ASIO and using them is a big win in ASIO type applications over STL futures. These circumstances are not common in general purpose C++ code, and probably deliver little benefit except maybe a portable continuations implementation on an older STL. All the above is in my documentation to warn people away from using them with the wrong expectations.
The reason `std::shared_future` cannot make use of embedded storage, thus necessarily requiring allocation, has to do with lifetime and thread-safety. `std::shared_future::get` returns a reference to the resulting value, which is guaranteed to be valid for as long as there is at least one instance of `std::shared_future` around. If embedded storage were to be used, it would imply moving the location of the resulting value when the instance holding it goes away. This can happen in a separate thread, as `std::shared_future` and `std::shared_future::get` are thread-safe. All in all it would lead to the following scenario:
std::shared_future<T> s = get_a_shared_future_somehow(); T const& r = s.get(); std::cout << r; // potentially UB, potentially race
Boost.Outcome implements its std::shared_future equivalent using a wrap of std::shared_ptr for all the reasons you just outlined. For shared futures as defined by the standard it cannot be avoided, particularly that get() must behave a certain way. You could implement a non-consuming future without unique storage using Boost.Outcome's framewowrk i.e. future.get() returns a value, not a const lvalue ref and you can call future.get() as many times as you like. This is how I was planning to implement afio::future<>::get_handle() which as that is a shared_ptr, its storage moving around is not a problem.
Such an implementation would use embedded storage under those partly-runtime conditions, which is quite a restricted population but still promising as it covers the basic `std::future<int>` scenario. But as it usually happens, it is a tradeoff, as such an implementation would have to incur synchronization overhead every time either of the `std::future/promise` is moved for the case where the `std::future` is retrieved before the value is ready, which in my experience comprises the majority of the use cases.
I've not found this in my synthetic benchmarks. In fact, because the entire of a future<T> or a promise<T> fits into a single cache line (where sizeof(T) is small), performance under contention (which is only the case from promise::get_future() until promise::set_value() which detaches the pair) is excellent. As for real world benchmarks, I haven't tried these yet. I'll find out soon. It could be these show a penalty.
Finally, in Lenexa the SG1 decided to accept as a defect LWG2412, which allows for (I) and (II) to happen concurrently (previously undefined behavior). This appears to not have yet moved forward by LWG yet. It represents the following scenario:
std::promise<int> p; std::thread t([&] { p.set_value(42); }); std::future<int> f = p.get_future();
which is in reality no different than the previous scenario, but which an embedded storage `std::promise` implementation needs to address with more synchronization.
My implementation implements this defect resolution.
Why is this synchronization worth mention at all? Because it hurts concurrency. Unless you are in complete control of every piece of code that touches them and devise it so that no moves happen, you are going to see the effects of threads accessing memory of other threads with all what it implies. But today's `std::future` and `std::promise` are assumed to be cheaply movable (just a pointer swap). You could try to protect from it by making `std::future` and `std::promise` as long as a cache line, and even by simply using dynamic memory allocation for them together with an appropriate allocator specifically designed to aid whatever use case you could have where allocation time is a constraint.
And finally, let's not forget that the Concurrency TS (or actually the futures continuation section of it) complicates matters even more. The addition of `.then` requires implementations to store an arbitrary Callable around until the future to which it was attached becomes ready. Arguably, this Callable has to be stored regardless of whether the future is already ready, but I'm checking the final wording and it appears that you can as-if run the continuation in the calling thread despite not being required (and at least discouraged in an initial phase).
I read the postconditions as meaning: if(future.is_ready()) callable(future); else store_in_promise_for_later(callable); ... which is what I've implemented. I *do* allocate memory for continuations, one malloc per continuation added.
Similar to the earlier allocator case, this Callable can be whatever so it involves type-erasure in some way or another, which will require memory allocation whenever it doesn't fit within a dedicated small buffer object.
Correct. In my case, I have sixteen bytes available which isn't enough for a small buffer object, hence the always-allocate.
To sum things up (based on my experience and that of others which I had a chance to discuss the subject), a non-allocating quasi-conformant `std::future/promise` implementation would cater only to a very limited set of types in highly constrained scenarios where synchronization overhead is not a concern. In real word scenarios, and specially those that rely heavily on futures due to the use of continuations, time is better spent by focusing in memory allocation schemes (the actual real concern after all) by using the standard mechanism devised to tend to exactly those needs: allocators.
I concur.
I'll be interested in hearing your findings during your work on the subject. And would you want me to have a look at your implementation and come up with ways to "break it" (which is what I do best), you have just to contact me.
A number of people have complained by private email to me why I don't ship lightweight futures in the next few weeks as "they look ready". You've just summarised perfectly why not. In my case, my gut instinct is that these lightweight futures will be a major boon to the AFIO engine - my first step is a direct replace all of the STL futures with lightweight futures and not touching anything else. My gut instinct is that I'll gain maybe 5% on maximum dispatch. But equally I might see a regression, and it could even turn out to a terrible idea. In which case I'll return to the drawing board. Until the numbers are in, I won't decide one way or another. BTW I welcome any help in breaking them, once they are ready for that which I currently expect will be early 2016 assuming I don't find they are a terrible idea. I need to take a few months off after the pace of the last seven months. My thanks in advance to you for the offer of help, I'll definitely take it when the time comes. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/