On Fri, Sep 4, 2015 at 7:54 PM, Oliver Kowalke
2015-09-04 20:10 GMT+02:00 Giovanni Piero Deretta
: - Boost.Fiber is yet another library that comes with its own future type. For the sake of interoperability, the author should really contribute changes to boost.thread so that its futures can be re-used.
boost::fibers::future<> has to use internally boost::fibers::mutex instead of std::mutex/boost::mutex (utilizing for instance pthread_mutex) as boost.thread does. boost::fibers::mutex is based on atomics - it does not block the thread - instead the runing fiber is suspended and another fiber will be resumed. a possible future implementation - usable for boost.thread + boost.fiber - must offer to customize the mutex type. futures from boost.thread as well as boost.fiber are allocating futures, e.g. the share-state is allocated on the free-store. I planed to provide non-allocating future as suggested by Tony Van Eerd. Fortunately Niall has already implemented it (boost.spinlock/boost.monad) - no mutex is required. If boost.monad is accepted in boost I'll to integrate it in boost.fiber.
So, I do not want a non-allocating future, as I think it is actually counter-productive. I only want a way to combine boost::thread::future and boost::fiber::future (i.e. in a call to wait_all/wait_any). There are two ways to do that: 1) either a simple protocol that allows efficient future interoperation (note that efficient is key here, otherwise 'then' could also work) between distinct futures. or 2) boost::fiber::future is simply a tiny wrapper over boost::thread::future that overrides the wait policy. In the second case of course boost.thread must allow specifying a wait policy and must not use mutexes internally (it should either have a lock free implementation or use spin locks). [...]
- The performance session lists a yield at about 4000 clock cycles. That seem excessive, considering that the context switch itself should be much less than 100 clock cycles. Where is the overhead coming from?
yes, the context switch itself takes < 100 cycles probably the selection of next ready fiber (look-up) might takes some time additionally - in the tests for the performance the stack allocation is measured too
Hum, right, the test is not just measuring the performance of yield. Do you have/can you write a benchmark that simply measures the yield between N futures, over a few thousand iterations? Anyway, if we subtract the create + join cost from the benchmark, the cost is still in the 2us range. Shouldn't the next fiber selection be just simple list pop front when there are runnable fibers (i.e. no work stealing is required)?
What's the overhead for an os thread yield?
32 µs
So boost.fiber is about an order of magnitude. It is good, but was hoping for more.
The last issue is particularly important because I can see a lot of spinlocks in the implementation.
the spinlocks are required because the library enables synchronization of fiber running in different threads
With a very fast yield implementation, yielding to the next ready fiber could lead to a more efficient use of resources.
if a fiber A gets suspended (waiting/yielding) the fiber_manager, and thus the scheduling-algorithm, is executed in the context of fiber A. the fiber-manager picks the next fiber B to be resumed and initiates the context switch. do you have specific suggestions?
Please ignore my last two comments. I only meant to say that spinning was wasteful and you should yield to the next fiber. But that's actually the case in the current spinlock implementation, I should have looked more carefully. Btw, why spinlock is in details? It could be useful to expose it. -- gpd