Phil Endecott wrote:
Can we improve how interprocess mutexes and condition variables behave on process termination?
Having given this some more thought:
I think it would be useful if Boost.Interprocess added
a robust mutex, as a straightforward wrapper around the
POSIX robust mutex and equivalents on other platforms if
they exist. I note that there is a patch that does this
on the Interprocess issue tracker but it unconditionally
cleans up the mutex when it find that the other process
died, which is wrong. I believe that the lock() method
should fail in that case, and it should provide a
make_consistent method that the user can invoke if
appropriate before retrying. Then read and write locks,
with appropriate clean-up behaviour, can be implemented
on top of that.
Vinicius dos Santos Oliveira
After some more thought, here is another idea: PTHREAD_MUTEX_ROBUST is no longer a property of the mutex, but a property of the lock.
I don't see how that can be implemented on top of the
POSIX API, where robustness is a property of the mutex.
Andrey Semashev
* PTHREAD_MUTEX_ROBUST might be part of the solution. That seems to require the non-crashed process to do clean up, i.e. we would need to record whether the crashed process were reading or writing and react appropriately.
You can't do that reliably because the crashed process could have crashed between locking the mutex and indicating its intentions.
I don't follow. Say I have a bool in the mutex called being_written. It's initially false, the read lock doesn't touch it, and the write lock does: lock() { m.lock(); being_written = true; memory_barrier(); } unlock() { memory_barrier(); being_written = false; m.unlock(); } If the process crashes between locking and setting being_written, then the process doing the cleanup will see being_written = false, and that's OK because the crasher hadn't actually written anything. Regarding blocking signals, I agree this is not really something that should be part of the interprocess synchronisation primitives, but I do think that a modern wrapper around the ancient C signals API would be good to have.
I'm less clear about what happens to condition variables, but it does seem that perhaps terminating a process while it is waiting on a condition will cause other processes to deadlock. Perhaps the wait conceptually returns and the mutex is re-locked during termination.
AFAIR, pthread_cond_t uses a non-robust mutex internally, which means that condition variables are basically useless when you need robust semantics.
Yes.
If you need a condition variable-like behavior, in a robust way, I think your best bet is to use futexes directly.
Yes, that is the conclusion that I've also come to - but it is probably a very difficult problem. Note that robust mutexes use futexes rather differently from regular mutexes, and there is kernel involvement at process termination (see man get_robust_list). A robust condition variable would have to do something similar. I find this all rather surprising, as interrupting a waiting condition variable is often much more common than interrupting a locked mutex. Regards, Phil.