Stonewall Ballard wrote:
I think I found the cause of this problem. It seems that the caller of interrupt_all should be holding the mutex associated with the condition on which the threads are waiting.
This looks like a classic CV pitfall when using "atomic" predicates that are not protected by the mutex used for the wait. The basic outline is that thread A does a CV wait on an atomic boolean variable, and thread B sets this variable and does a notify. There exists a race in which A sees false, is preempted, B stores true, does notify, waking up nobody, and then A continues with the wait. The cure is to insert an empty lock+unlock of A's mutex in B between the store and the notify; it doesn't need to encompass the store, and it doesn't need to encompass the notify call, either. I can't read the boost.thread code well enough to be able to diagnose the problem, but from a cursory look, it looks possible to me that condition_variable::wait may perform its interruption check, be preempted, the interrupt can proceed with setting interrupt_requested and doing a broadcast, and then condition_variable::wait to block on its pthread_cond_wait. Your workaround does prevent it, but I think that boost.thread should take care of that internally, by storing a mutex pointer in the thread data, along with the cv pointer.