On 02/15/17 20:42, Phil Endecott via Boost wrote:
Dear Experts,
I've just been surprised by the behaviour of the interprocess mutex and condition variable on abnormal process termination, i.e. they are not automatically released.
Google tells me that I'm not the first to be surprised by this; there have been previous posts here, stack overflow questions etc.
One often-valid observation is that if a process crashes - or otherwise terminates without executing its destructors - while it holds a lock on a shared data structure then the data is probably now corrupt, so unlocking the mutex that protects it is not very useful. I think there is an important case where that does not apply - when the process that crashes is only reading the shared data. In my case, I had written a "monitor" utility that loops forever, waiting on a shared condition, taking the corresponding mutex, and then dumping the shared data to stdout. I had been running this and stopping it by pressing ctrl-C and it had not occurred to me that this might not work as I expected. My attempt at debugging using this utility was making my problems worse, not better! Modifying this code to run destructors on ctrl-C is non-trivial.
I am aware that the SysV shared semaphore is able to undo on process termination (see SEM_UNDO in man semop), and I had assumed that Boost.Interprocess was using this or something like it. I now see that it is using pthreads, which I didn't even realise could work between processes, and I don't think this API has any way to specify process termination behaviour.
There is a way to handle this case, but this API is not universally supported: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_... If that API is not supported on your platform, you may want to avoid locking the mutex without a timeout (i.e. failing to acquire a mutex for a given time should be considered an indication that the mutex has been abandoned in the locked state). In general, synchronization primitives that reside in shared memory (such as pthread mutexes or Boost.Interprocess mutexes) should be considered vulnerable to (a) corruption and (b) becoming unusable (like, indefinitely locked) because of a user process misbehavior. That is rather obvious considering that such primitives typically do not include any other resources, such as handles to kernel objects or file descriptors and as such "don't exist" for the kernel (consequently, the kernel cannot release them on process termination). Robust mutexes that I referenced above are an exception to that general rule. Named primitives, such as SysV semaphores, are typically more protected because there is at least a file descriptor or something that corresponds to the name and there is usually a limited API to interact with the primitive (i.e. you usually don't have a direct access to the primitive data). There are a number of named synchronization primitives in Boost.Interprocess, although I don't think they provide "auto unlock on process termination" feature.
Anyway, I'd like to suggest that the interprocess docs should make some mention of the behaviour of the synchronisation primitives on process termination, e.g. somewhere near the beginning of http://www.boost.org/doc/libs/1_63_0/doc/html/interprocess/synchronization_m...
I may now try to implement some primitives that use semop() and unlock automatically. I haven't yet looked at what's involved to implement a condition variable on top of a semaphore, so I may not get very far! Has anyone else ever tried this?
If you want (more or less) reliable interprocess synchronization, you will currently have to implement it yourself. There are a number of compromises to make along the way. For instance, pthread robust mutexes API does not quite fit into the traditional C++ mutex API, so one has to improvise. In the absence of robust mutexes, the timeout workaround is not universally applicable, and the timeout itself is, obviously, case-specific. Also, most of these APIs are not fully portable (not between Windows and POSIX-compatible systems, anyway), so you end up with OS-specific branches. I did implement this an a few of my projects. One example is Boost.Log, where I opportunistically use robust mutexes: https://github.com/boostorg/log/blob/develop/src/posix/ipc_sync_wrappers.hpp https://github.com/boostorg/log/blob/develop/src/posix/ipc_reliable_message_... You can see Windows implementation is quite different: https://github.com/boostorg/log/blob/develop/src/windows/ipc_sync_wrappers.h... https://github.com/boostorg/log/blob/develop/src/windows/ipc_sync_wrappers.c... https://github.com/boostorg/log/blob/develop/src/windows/ipc_reliable_messag... The best solution to these problems, however, is to avoid locks altogether and use lock-free algorithms in such a way that any data in the shared memory is valid and can be handled.