On 2/09/2015 12:11, Andrey Semashev wrote:
(And the above is also supposed to use LL/SC on architectures where this is cheaper than CAS, although I'm not sure if this is the case.)
I'm not sure there are architectures that implement both CAS and LL/SC instructions, at least I'm not aware of such. On the architectures that support LL/SC, the instructions will be used to implement compare_exchange_weak. The modify function in this CAS loop will not be executed within the LL/SC region, which is why the additional load before the loop is required. There is also a probability of CAS failure.
Right. I meant that on LL/SC architectures, compare_exchange_weak is "more native", while compare_exchange_strong is "more native" on CAS architectures.
There is another point to consider. compare_exchange_weak/strong opereation on an LL/SC architecure is more complex than what is required to implement a simple RMW operation. For example, let's see it in Boost.Atomic code for ARM. Here is fetch_add, which can be used as a prototype of what could be done with a generic RMW operation:
"1:\n" "ldrex %[original], %[storage]\n" "add %[result], %[original], %[value]\n" // modify "strex %[tmp], %[result], %[storage]\n" "teq %[tmp], #0\n" "bne 1b\n"
Frankly, I'd like to be able to generate code like this for operations other than those defined by the standard atomic<> interface.
That's true, and so would I. The danger is that what you can do while holding the X bit is quite limited, sometimes extremely so. If the function call is *not* inlined then even that might be enough to tip it to always fail. So you'd need a fallback to basic compare_exchange_weak (which should eventually succeed unless the machine is really hammered) in such cases. Of course, that's basically what you said in the OP. :)