Re: [boost] Proposal for a thread synchronisation primitive: future_queue

9 May 2018

      On 9/05/2018 17:30, Tim Blechmann wrote:
...
...
...
On a second thought, maybe I could even change the implementation of
the future_queue to internally use e.g. the well-known
boost::lockfree queues together with some semaphore to implement the
blocking behaviour. I am not 100% sure yet, but I think all current
features could be implemented by that. Do you think this would
improve the implementation? It might be a bit simpler and maybe even
more efficient in some cases.
I would love to see a portable eventcount
(http://www.1024cores.net/home/lock-free-algorithms/eventcounts)
implementation in Boost.Lockfree.
i'm not sure, if boost.lockfree is a good place for it (is it even
possible to implement them without relying on system calls, which may
acquire locks in kernel-space?)
Even the Boost.LockFree queue may not always be lock-free -- eg. if you 
push more nodes than the queue's capacity (and it's not fixed_sized) it 
will allocate more, which will often take a Big System Lock (though 
modern allocators try to use smaller less-contended locks to reduce the 
impact).

And these things are usually implemented with a fast-path that does not 
perform any kernel operations, and a slow-path that does (along with the 
hope that the slow-path is not triggered too often).

As soon as you start talking about wanting to block, you have to involve 
the kernel (or a userspace scheduler).  This is a choice to gain some 
convenience while sacrificing some "lock-free-ness", just like the 
choice to use a non-fixed queue.

The basic "lock-free but blocking" expected behaviour is:

   1. If pushing but the queue is full, then block until it isn't 
(blocking path)
   2. If a popper is waiting, push and then wake them up (slow-path)
   3. Otherwise just push without kernel operations (fast-path)

And similarly on the consuming end for an empty queue.  (And a 
non-fixed-size queue might omit step #1 entirely, though still end up 
blocking a little if it hits an allocator.)

The main trick is in the wakeups (and reliably knowing whether someone 
is waiting); you can never really avoid making kernel calls to wake up 
another thread, but you want to do so only when actually needed (erring 
on the side of doing an unnecessary wakeup rather than never doing a 
wakeup), and you need to use constructs that (while perhaps not 
completely lock-free) are at least wait-free.

Semaphores and Win32 events are usually good candidates for this, 
because the scheduler generally treats them as atomic operations as far 
as user code is concerned.  Traditional mutexes (and condition 
variables, since they rely on those) are bad candidates because the 
thread might be preempted while a lock is held, which prevents forward 
progress of another thread.

But you also usually have to be careful to use the "true" 
synchronisation primitives of the platform, not emulations defined in 
libraries, because those might use locks or other constructs that are 
problematic.  When you're implementing a platform-specific abstraction, 
it's one of the rare cases that it's not a good idea to depend on other 
abstractions.

Re: [boost] Proposal for a thread synchronisation primitive: future_queue

Gavin Lambert