On 2020-05-14 20:43, Andrey Semashev wrote:
On 2020-05-14 13:41, Phil Endecott via Boost wrote:
Dear Experts,
Can we improve how interprocess mutexes and condition variables behave on process termination?
Currently if a process terminates (i.e. it crashes, or you press ctrl-C), the interprocess docs say nothing as far as I can see about what happens to locked mutexes and awaited conditions. In practice it seems that mutexes that were locked remain locked, and other processes will deadlock. (I'm using Linux.) A few thoughts:
* If a process were only reading the shared state, then it would be appropriate for the mutex to be unlocked on termination.
* If a process were modifying the shared state, then it would be wrong to unconditionally unlock the mutex. So it would be useful to distinguish between reader and writer locks, even if we're not implementing a single-writer/multiple-reader mutex.
* The system could be made more robust by blocking signals while a mutex is locked. This doesn't help with crashes, e.g. segfaults, but it would help with ctrl-C.
Catching signals is a good idea regardless of IPC and locking mutexes. As long as there is a moment when your application holds some valuable data or some state (e.g. a network connection) that needs to be properly saved or cleaned up on exit, you have to implement proper signal handling and graceful program termination.
To be clear, I don't mean that Boost.Interprocess should be dealing with signals. User's application should.
* It may be useful to cause all processes to terminate if one of them terminates with a mutex held for writing, either immediately or as soon as they try to lock the same mutex. Perhaps also to delete the presumed-corrupted shared memory segment.
* PTHREAD_MUTEX_ROBUST might be part of the solution. That seems to require the non-crashed process to do clean up, i.e. we would need to record whether the crashed process were reading or writing and react appropriately.
You can't do that reliably because the crashed process could have crashed between locking the mutex and indicating its intentions. For an other process to be able to restart or roll back a failed operation, that operation has to be implemented in a lock-free fashion, so that each step is atomic. At this point mutexes become redundant.
In my experience, the only sensible reaction to an abandoned operation (regardless of the way you use to detect the abandoned state) is to scrap it and abort or start over in a new shared memory segment.
I'm less clear about what happens to condition variables, but it does seem that perhaps terminating a process while it is waiting on a condition will cause other processes to deadlock. Perhaps the wait conceptually returns and the mutex is re-locked during termination.
AFAIR, pthread_cond_t uses a non-robust mutex internally, which means that condition variables are basically useless when you need robust semantics.
If you need a condition variable-like behavior, in a robust way, I think your best bet is to use futexes directly.