Proposal for a thread synchronisation primitive: future_queue
Dear boost developers, I want to propose a thread synchronisation primitive, which works like a mixture of (lockfree) queues and futures. Therefore I gave it the name "future_queue". It is basically a lockfree queue which can be waited on for new values (without busy waiting), but it has some nice extra features like when_any() and continuations. The github project can be found here: https://github.com/mhier/future_queue It already contains a README file with a short description, an example and results of a first performance measurement. Further documentation so far exists only inside the code: https://github.com/mhier/future_queue/blob/master/include/future_queue.hpp My first question is, whether this is really something new (I might have overlooked something in the many boost libraries) and if it makes sense to integrate it into the boost libraries. Another question would be whether this should really be its own library, since it is basically a single class (with a few helper classes) and a single non-member function. Maybe it would be better to make it part of an existing library? I am happy to receive some first feedback! Cheers, Martin -- Martin Hierholzer DESY Notkestrasse 85 22607 Hamburg Office 55a / 123 Tel: +49 40 8998 1897 Mob: +49 176 992 46 476 Skype: mhier48 This e-mail may contain confidential and/or privileged information and is intended only for the use of the individual or entity named above. If you received it in error, please advise the sender immediately by reply e-mail and delete the original. Any further use - especially any form of publication - of this e-mail without explicit authorisation is strictly prohibited.
I like the idea. Btw, why isn't the has_data() named empty() to conform with stl containers? Also it would be nice to have the member functions documented in the readme also. (I might be missing something though). /Viktor On Tue, Apr 24, 2018 at 9:13 AM Martin Hierholzer via Boost < boost@lists.boost.org> wrote:
Dear boost developers,
I want to propose a thread synchronisation primitive, which works like a mixture of (lockfree) queues and futures. Therefore I gave it the name "future_queue". It is basically a lockfree queue which can be waited on for new values (without busy waiting), but it has some nice extra features like when_any() and continuations.
The github project can be found here: https://github.com/mhier/future_queue
It already contains a README file with a short description, an example and results of a first performance measurement. Further documentation so far exists only inside the code: https://github.com/mhier/future_queue/blob/master/include/future_queue.hpp
My first question is, whether this is really something new (I might have overlooked something in the many boost libraries) and if it makes sense to integrate it into the boost libraries.
Another question would be whether this should really be its own library, since it is basically a single class (with a few helper classes) and a single non-member function. Maybe it would be better to make it part of an existing library?
I am happy to receive some first feedback!
Cheers, Martin
-- Martin Hierholzer DESY Notkestrasse 85 22607 Hamburg Office 55a / 123 Tel: +49 40 8998 1897 <+49%2040%2089981897> Mob: +49 176 992 46 476 <+49%20176%2099246476> Skype: mhier48
This e-mail may contain confidential and/or privileged information and is intended only for the use of the individual or entity named above. If you received it in error, please advise the sender immediately by reply e-mail and delete the original. Any further use - especially any form of publication - of this e-mail without explicit authorisation is strictly prohibited.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Dear Viktor, thank you very much for your feedback!
I like the idea. Btw, why isn't the has_data() named empty() to conform with stl containers?
Good point. I just changed this.
Also it would be nice to have the member functions documented in the readme also. (I might be missing something though).
I have now added a second .md file containing the reference documentation of all member and non-member functions. It is linked in the README.md file. Cheers, Martin
On 25/04/2018 04:01, Martin Hierholzer wrote:
I want to propose a thread synchronisation primitive, which works like a mixture of (lockfree) queues and futures. Therefore I gave it the name "future_queue". It is basically a lockfree queue which can be waited on for new values (without busy waiting), but it has some nice extra features like when_any() and continuations.
The github project can be found here: https://github.com/mhier/future_queue
Looks like a nice concept at first glance (I haven't looked at the implementation), although some of those known issues (particularly the limited lifespan) would probably need to be solved before it could be accepted.
My first question is, whether this is really something new (I might have overlooked something in the many boost libraries) and if it makes sense to integrate it into the boost libraries.
I don't believe there is currently any equivalent, and it is a hole that would be nice to fill. It's not a completely new concept, of course; one of the original future proposals included a future_stream that was essentially a recursive future -- whenever it completed it produced both a value and another future for the next value. This didn't make it into Boost (or the Standard) in the end, though I assume many people have reimplemented something similar on their own. Having said that, the usual thinking on implementing blockable lockfree queues doesn't involve futures per se, but rather focuses on using an "eventcount" to provide the blocking behaviour as a wrapper around the core data structure itself, so it can be used with multiple different kinds of queues -- lockfree is usually a balancing act between tradeoffs, so different use cases benefit most from different structures.
Dear Gavin, thank you, too, for your nice feedback!
Looks like a nice concept at first glance (I haven't looked at the implementation), although some of those known issues (particularly the limited lifespan) would probably need to be solved before it could be accepted.
It clearly is not yet ready, I know this. I just want to get feedback as early as possible to steer into the right direction from the beginning.
It's not a completely new concept, of course; one of the original future proposals included a future_stream that was essentially a recursive future -- whenever it completed it produced both a value and another future for the next value. This didn't make it into Boost (or the Standard) in the end, though I assume many people have reimplemented something similar on their own.
This sounds like a very similar concept indeed, but I cannot find anything about this on the web. Maybe the future_queue is a bit different in the sense that one can have multiple ready values in the queue, just like in any normal fifo queue.
Having said that, the usual thinking on implementing blockable lockfree queues doesn't involve futures per se, but rather focuses on using an "eventcount" to provide the blocking behaviour as a wrapper around the core data structure itself, so it can be used with multiple different kinds of queues -- lockfree is usually a balancing act between tradeoffs, so different use cases benefit most from different structures.
I know that, but I think with the future_queue approach one can do much more (and is also simpler to use since you don't have to bother with wrapping the queue and dealing with counters). I think the important features are continuations, when_any and (today added) when_all. With these one can build something like "processing pipelines" even without having to explicitly care about the creation of threads. I will prepare a more advanced example showing better what I mean. I will let you know when it's ready. Cheers, Martin
On 27/04/2018 05:12, Martin Hierholzer wrote:
This sounds like a very similar concept indeed, but I cannot find anything about this on the web. Maybe the future_queue is a bit different in the sense that one can have multiple ready values in the queue, just like in any normal fifo queue.
The recursive futures support that too, in the sense that when it returns a "next" future it might already have a completed value which can be extracted synchronously without blocking. There's an example in this post https://lists.boost.org/Archives/boost/2007/03/118742.php. Of course, it relies on futures, shared_ptrs, and memory allocations, which might be too heavyweight for some usages. But it's also trivial to implement around those existing types, which is probably why it didn't end up making it into the standard as a separate type.
On a second thought, maybe I could even change the implementation of the future_queue to internally use e.g. the well-known boost::lockfree queues together with some semaphore to implement the blocking behaviour. I am not 100% sure yet, but I think all current features could be implemented by that. Do you think this would improve the implementation? It might be a bit simpler and maybe even more efficient in some cases.
I would love to see a portable eventcount (http://www.1024cores.net/home/lock-free-algorithms/eventcounts) implementation in Boost.Lockfree.
On a second thought, maybe I could even change the implementation of the future_queue to internally use e.g. the well-known boost::lockfree queues together with some semaphore to implement the blocking behaviour. I am not 100% sure yet, but I think all current features could be implemented by that. Do you think this would improve the implementation? It might be a bit simpler and maybe even more efficient in some cases.
I would love to see a portable eventcount (http://www.1024cores.net/home/lock-free-algorithms/eventcounts) implementation in Boost.Lockfree.
i'm not sure, if boost.lockfree is a good place for it (is it even possible to implement them without relying on system calls, which may acquire locks in kernel-space?)
On 9/05/2018 17:30, Tim Blechmann wrote:
On a second thought, maybe I could even change the implementation of the future_queue to internally use e.g. the well-known boost::lockfree queues together with some semaphore to implement the blocking behaviour. I am not 100% sure yet, but I think all current features could be implemented by that. Do you think this would improve the implementation? It might be a bit simpler and maybe even more efficient in some cases.
I would love to see a portable eventcount (http://www.1024cores.net/home/lock-free-algorithms/eventcounts) implementation in Boost.Lockfree.
i'm not sure, if boost.lockfree is a good place for it (is it even possible to implement them without relying on system calls, which may acquire locks in kernel-space?)
Even the Boost.LockFree queue may not always be lock-free -- eg. if you push more nodes than the queue's capacity (and it's not fixed_sized) it will allocate more, which will often take a Big System Lock (though modern allocators try to use smaller less-contended locks to reduce the impact). And these things are usually implemented with a fast-path that does not perform any kernel operations, and a slow-path that does (along with the hope that the slow-path is not triggered too often). As soon as you start talking about wanting to block, you have to involve the kernel (or a userspace scheduler). This is a choice to gain some convenience while sacrificing some "lock-free-ness", just like the choice to use a non-fixed queue. The basic "lock-free but blocking" expected behaviour is: 1. If pushing but the queue is full, then block until it isn't (blocking path) 2. If a popper is waiting, push and then wake them up (slow-path) 3. Otherwise just push without kernel operations (fast-path) And similarly on the consuming end for an empty queue. (And a non-fixed-size queue might omit step #1 entirely, though still end up blocking a little if it hits an allocator.) The main trick is in the wakeups (and reliably knowing whether someone is waiting); you can never really avoid making kernel calls to wake up another thread, but you want to do so only when actually needed (erring on the side of doing an unnecessary wakeup rather than never doing a wakeup), and you need to use constructs that (while perhaps not completely lock-free) are at least wait-free. Semaphores and Win32 events are usually good candidates for this, because the scheduler generally treats them as atomic operations as far as user code is concerned. Traditional mutexes (and condition variables, since they rely on those) are bad candidates because the thread might be preempted while a lock is held, which prevents forward progress of another thread. But you also usually have to be careful to use the "true" synchronisation primitives of the platform, not emulations defined in libraries, because those might use locks or other constructs that are problematic. When you're implementing a platform-specific abstraction, it's one of the rare cases that it's not a good idea to depend on other abstractions.
----- Original Message -----
From: "boost"
To: "boost" Cc: "Gavin Lambert" Sent: Wednesday, 9 May, 2018 08:43:34 Subject: Re: [boost] Proposal for a thread synchronisation primitive: future_queue
Even the Boost.LockFree queue may not always be lock-free
Even std::atomic is not guaranteed to be lock free, see std::atomic::is_lock_free(). I think it is mainly important to specify precisely when to expect lock-free behaviour and when locks might internally occur.
But it's also trivial to implement around those existing types, which is probably why it didn't end up making it into the standard as a separate type.
Sure it is possible to combine certain primitives to more complex primitives and users can always do this themselves. Still, my proposal goes beyond just a combination of an eventcount/semaphore and a lockfree queue. Just like futures it knows continuations, when_all/when_any and similar things. If you say it's too trivial to put this into a fundamental library, one could say a future is, too. A future also is just a combination of a value and a semaphore. Yet it is a widely used primitive everyone is happy to have. My actual use case for the future_queue is a class of applications which can be described maybe as a very complex, multi-threaded soft PLC (programmable logic controller) application. For now we have only soft realtime requirements which are even met by the current spsc_queue<future> construction. We have many threads running (like 100 or 200), basically one thread per task or action. This simplifies the program structure a lot, since we have independent modules with each one thread. This may sound crazy, but actually the performance is much better than I originally anticipated (sleeping threads don't cost performance, only context switches are important). The future_queue would help to implement constructions like this much easier. Tasks can be implemented as continuations. Since we have when_all() and when_any(), which can be continued on their own, one can even create tasks which use multiple future_queues as an input. Maybe this explains a bit better, what I am aiming at here?
I would love to see a portable eventcount (http://www.1024cores.net/home/lock-free-algorithms/eventcounts) implementation in Boost.Lockfree.
I would anyway offer to work on a portable semaphore, but I am afraid I don't have much experience with platform independent implementations. I will probably soon extend my simple semaphore class from the future_queue library to work with other values than 0 or 1 (since I need that for a better implementation of the future_queue) and I can also try to make a good implementation for Mac OS X, if this helps you. I don't really have access to development tools on Windows any more and basically no experience on other platforms. Maybe someone else can add more implementations? Cheers, Martin
I would love to see a portable eventcount (http://www.1024cores.net/home/lock-free-algorithms/eventcounts) implementation in Boost.Lockfree.
I would anyway offer to work on a portable semaphore, but I am afraid I don't have much experience with platform independent implementations. I will probably soon extend my simple semaphore class from the future_queue library to work with other values than 0 or 1 (since I need that for a better implementation of the future_queue) and I can also try to make a good implementation for Mac OS X, if this helps you. I don't really have access to development tools on Windows any more and basically no experience on other platforms. Maybe someone else can add more implementations?
https://github.com/boostorg/sync/blob/develop/include/boost/sync/semaphore.h... unfortunately we never got it finished/reviewed. iirc obtaining the semaphore count doesn't really map to all OS apis ... as for macos iirc the different semaphore apis have quite different performance characteristics (iirc unnamed semaphores were not implemented while dispatch semaphores had the best throughput).
Hi again, I have updated the README with a second, bigger example showing also the use of continuations, when_all and when_any.
Having said that, the usual thinking on implementing blockable lockfree queues doesn't involve futures per se, but rather focuses on using an "eventcount" to provide the blocking behaviour as a wrapper around the core data structure itself, so it can be used with multiple different kinds of queues -- lockfree is usually a balancing act between tradeoffs, so different use cases benefit most from different structures.
On a second thought, maybe I could even change the implementation of the future_queue to internally use e.g. the well-known boost::lockfree queues together with some semaphore to implement the blocking behaviour. I am not 100% sure yet, but I think all current features could be implemented by that. Do you think this would improve the implementation? It might be a bit simpler and maybe even more efficient in some cases. Cheers, Martin
Having said that, the usual thinking on implementing blockable lockfree queues doesn't involve futures per se, but rather focuses on using an "eventcount" to provide the blocking behaviour as a wrapper around the core data structure itself, so it can be used with multiple different kinds of queues -- lockfree is usually a balancing act between tradeoffs, so different use cases benefit most from different structures.
On a second thought, maybe I could even change the implementation of the future_queue to internally use e.g. the well-known boost::lockfree queues together with some semaphore to implement the blocking behaviour. I am not 100% sure yet, but I think all current features could be implemented by that. Do you think this would improve the implementation? It might be a bit simpler and maybe even more efficient in some cases.
one point to consider: even if the queue is lockfree, the semaphore operations may internally acquire locks (maybe not in user-space, but in the OS) ... i've double checked the kernel-space implementations on linux and darwin some time ago: posting/notifying a semaphore is not lockfree on these platforms, as some code paths involve (spin)locks. so if lockfree behavior is required to produce values from a real-time context, one may have to poll a lockfree queue. cheers, tim
participants (4)
-
Gavin Lambert
-
Martin Hierholzer
-
Tim Blechmann
-
Viktor Sehr