On 09/01/2015 04:35 AM, Andrey Semashev wrote:
Hi,
I was wondering is there would be interest in a generalized read-modify-write operation in Boost.Atomic. The idea is to have a atomic<T>::read_modify_write method that would take a function that would perform the modify on the atomic value. The interface could be roughly this:
template< typename T > class atomic { public: // Returns the original value T read_modify_write(F modify, memory_order order = memory_order_seq_cst); };
Usage example:
// Saturating increment atomic< int > a; a.read_modify_write([](int n) { return n < std::numeric_limits< int >::max() ? n + 1 : n; });
The implementation could synthesize a CAS loop around the modify function or (ideally) wrap it inside the LL/SC instructions, also in a loop.
I believe, in order for this to work the modify function should meet certain requirements:
1. It must not use atomic operations (on this or other variables) or thread synchronization primitives. 2. It must not call functions that might involve context switching (e.g. syscalls, i/o). 3. It should be as small as possible and have the least possible working set (that is the number of accessed variables). 4. It must be prepared to be called multiple times. 5. It must be prepared to receive a value that has never been stored in the atomic variable.
The first restriction is essential to avoid deadlocking in case of the emulated (lock-based) atomic<>. Thread synchronization primitives may possibly be using atomic ops internally, so are prohibited as well.
The 2 and 3, besides just being something rational one would strive for anyway, are mostly targeted to allow to inject the function call inside the LL/SC pair. It is known that some architectures are very sensitive to what is happening between the LL and SC instructions, so accessing some memory from the modify function or switching the thread context will result in SC always failing and an infinite loop.
The 5 would allow to make the first load non-atomic, which can be faster. The CAS or SC instruction will fail if the loaded value was inconsistent.
I know there's probably no way to ensure that the function meets these requirements, and it is possible to get an infinite loop in runtime, so maybe always using a CAS loop would be a safer approach, even on an architecture with LL/SC instructions. My hope is that compilers will someday provide intrinsics for this and will be able to generate optimal code, when possible.
Does this look interesting to anyone? Comments?
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
It's interesting although ensuring that all the requirements are met for the function is impossible. How about also a stand-alone version so that it can be used with std::atomic<T>?