On 19 Mar 2015 19:51, "Niall Douglas"
On 19 Mar 2015 at 18:05, Giovanni Piero Deretta wrote:
Your future still allocates memory, and is therefore costing about 1000 CPU cycles.
1000 clock cycles seems excessive with a good malloc implementation.
Going to main memory due to a cache line miss costs 250 clock cycles, so no it isn't. Obviously slower processors spin less cycles for a cache line miss.
Why would a memory allocation necessarily imply a cache miss. Eh you are even assuming an L3 miss, that must be a poor allocator!
Anyways, the plan is to add support to a custom allocator. I do not
think
you can realistically have a non allocating future *in the general case* ( you might optimise some cases of course).
We disagree. They are not just feasible, but straightforward, though if you try doing a composed wait on them then yes they will need to be converted to shared state. Tony van Eerd did a presentation a few C++ Now's ago on non-allocating futures. I did not steal his idea subconsciously one little bit! :)
I am aware of that solution My issue with that design is that it require an expensive rmw for every move. Do a few moves and it will quickly dwarf the cost of an allocation, especially considering that an OoO will happily overlap computation with a cache miss, while the required membar will stall the pipeline in current CPUs (I'm thinking of x86 of course). That might change in the near future though.
I understand what you are aiming at, but I think that the elidability is orthogonal. Right now I'm focusing on making the actual synchronisation fast and composable in the scenario where the program has committed to make a computation async.
This is fine until your compiler supports resumable functions.
This is funny :). A couple of months ago I was arguing with Gor Nishanov (author of MS resumable functions paper), that heap allocating the resumable function by default is unacceptable. And here I am arguing the other side :). OK my compromise is to not allocate while the async operation is merely deferred but can still be executed synchronously. Lazily convert to heap allocation only when the operation needs to be executed truly asynchronously, basically until you actually create the promise (at that point the cost of the async setup will provably dwarf the allocation; and even in this case the allocation can be skipped if we know we will sync before a move, then it us safe to allocate the shared state on the stack). This should allow to compiler to remove the abstraction completely if it can prove it safe. Still working on it, should have something in a few days. I guess I'm converging to your design.
Exactly as my C11 permit object is. Except mine allows C code and C++ code to interoperate and compose waits together.
Not at all. I admit not having studied permit in detail (the doc size is pretty daunting) but as far as I can tell the waiting thread will block
in
the kernel.
It can spin or sleep or toggle a file descriptor or HANDLE.
It provides a variety of ways on how to block, the user can't add more.
It provides a hook API with filter C functions which can, to a limited extent, provide some custom functionality. Indeed the file descriptor/HANDLE toggling is implemented that way. There is only so much genericity which can be done with C.
I believe my design is much simpler and flexible; then again is trying to solve a different and narrower problem than full scale synchronization of arbitrary threads. -- gpd