On 19 Mar 2015 at 18:05, Giovanni Piero Deretta wrote:
Your future still allocates memory, and is therefore costing about 1000 CPU cycles.
1000 clock cycles seems excessive with a good malloc implementation.
Going to main memory due to a cache line miss costs 250 clock cycles, so no it isn't. Obviously slower processors spin less cycles for a cache line miss.
Anyways, the plan is to add support to a custom allocator. I do not think you can realistically have a non allocating future *in the general case* ( you might optimise some cases of course).
We disagree. They are not just feasible, but straightforward, though if you try doing a composed wait on them then yes they will need to be converted to shared state. Tony van Eerd did a presentation a few C++ Now's ago on non-allocating futures. I did not steal his idea subconsciously one little bit! :)
I understand what you are aiming at, but I think that the elidability is orthogonal. Right now I'm focusing on making the actual synchronisation fast and composable in the scenario where the program has committed to make a computation async.
This is fine until your compiler supports resumable functions.
Exactly as my C11 permit object is. Except mine allows C code and C++ code to interoperate and compose waits together.
Not at all. I admit not having studied permit in detail (the doc size is pretty daunting) but as far as I can tell the waiting thread will block in the kernel.
It can spin or sleep or toggle a file descriptor or HANDLE.
It provides a variety of ways on how to block, the user can't add more.
It provides a hook API with filter C functions which can, to a limited extent, provide some custom functionality. Indeed the file descriptor/HANDLE toggling is implemented that way. There is only so much genericity which can be done with C.
My current best understanding of Vicente's plans is that each thread has a thread local condvar. The sole cause of a thread sleeping, apart from i/o, is on that thread local condvar. One therefore has a runtime which keeps a registry of all thread local condvars, and can therefore deduce the correct thread local condvars to wake when implementing a unified wait system also compatible with Fibers/resumable functions.
That doesn't work if a program wants to block for example in select or spin on a memory location, or on an hardware register, or wait for a signal, or interoperate with a different userspace thread library, or some other event queue (asio, qt or whatever) etc. and still also wait for a future. Well, you can use future::then, but it has overhead.
I think where we are vaguely heading is that anything Boost.Coroutine capable will convert blocking into coroutine scheduling. That way Thread v5 programs if they block doing ASIO or AFIO i/o under the bonnet it'll schedule other fibre work to do where possible, and any condvar or mutex blocks could turn into ASIO/AFIO work. A bit like a "userspace WinRT" I suppose. But sure, C++ is not WinRT. If the programmer writes an infinite for loop, he gets blocking behaviour. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/