Re: [boost] [interprocess] Mutex and condition at process termination

14 May 2020


      On 2020-05-14 20:43, Andrey Semashev wrote:
...
On 2020-05-14 13:41, Phil Endecott via Boost wrote:
...
Dear Experts,
Can we improve how interprocess mutexes and condition variables
behave on process termination?
Currently if a process terminates (i.e. it crashes, or you press
ctrl-C), the interprocess docs say nothing as far as I can see
about what happens to locked mutexes and awaited conditions.  In
practice it seems that mutexes that were locked remain locked,
and other processes will deadlock.  (I'm using Linux.)  A few
thoughts:
* If a process were only reading the shared state, then it would
be appropriate for the mutex to be unlocked on termination.
* If a process were modifying the shared state, then it would be
wrong to unconditionally unlock the mutex.  So it would be useful
to distinguish between reader and writer locks, even if we're not
implementing a single-writer/multiple-reader mutex.
* The system could be made more robust by blocking signals while
a mutex is locked.  This doesn't help with crashes, e.g. segfaults,
but it would help with ctrl-C.
Catching signals is a good idea regardless of IPC and locking mutexes. 
As long as there is a moment when your application holds some valuable 
data or some state (e.g. a network connection) that needs to be properly 
saved or cleaned up on exit, you have to implement proper signal 
handling and graceful program termination.
To be clear, I don't mean that Boost.Interprocess should be dealing with 
signals. User's application should.
...
...
* It may be useful to cause all processes to terminate if one of
them terminates with a mutex held for writing, either immediately
or as soon as they try to lock the same mutex.  Perhaps also to
delete the presumed-corrupted shared memory segment.
* PTHREAD_MUTEX_ROBUST might be part of the solution.  That seems
to require the non-crashed process to do clean up, i.e. we would
need to record whether the crashed process were reading or writing
and react appropriately.
You can't do that reliably because the crashed process could have 
crashed between locking the mutex and indicating its intentions. For an 
other process to be able to restart or roll back a failed operation, 
that operation has to be implemented in a lock-free fashion, so that 
each step is atomic. At this point mutexes become redundant.
In my experience, the only sensible reaction to an abandoned operation 
(regardless of the way you use to detect the abandoned state) is to 
scrap it and abort or start over in a new shared memory segment.
...
I'm less clear about what happens to condition variables, but it
does seem that perhaps terminating a process while it is waiting
on a condition will cause other processes to deadlock.  Perhaps
the wait conceptually returns and the mutex is re-locked during
termination.
AFAIR, pthread_cond_t uses a non-robust mutex internally, which means 
that condition variables are basically useless when you need robust 
semantics.
If you need a condition variable-like behavior, in a robust way, I think 
your best bet is to use futexes directly.