On 01/26/18 19:32, Peter Dimov via Boost wrote:
The interface strikes me as heavily influenced by what g++ can (__builtin_constant_p) and cannot (figure out that the result is not needed) do. The order of adding the functions probably also plays a part; were `op_and_test` added first, `opaque_op` probably wouldn't have been.
No, all extra ops were added at the same time. I was also planning to add a generalized `read_modify_write` operation but didn't do it because I don't have the hardware with TSX. It's true though that my main testing compiler is gcc. Clang doesn't seem to support __builtin_constant_p, which makes it fail to convert "add" to "inc" in add_and_test and opaque_add. OTOH, Intel compiler does support it and generates better code for add_and_test and opaque_add than for fetch_add. Maybe I should switch clang to the generic emulation backend. If the code can be improved for other compilers, I welcome suggestions and patches.
It's interesting to play and see what gets generated when. For instance, clang++ 3.6 figures out by itself that in `x1.fetch_and_add( 1 );` the result is not used, and generates `lock inc`.
How does it manage to do that, if you're using assembly `xadd`, I don't know.
`fetch_add` is implemented in terms of instrinsics (I assume, clang supports __atomic* intrinsics, so those should be used). It is expected that there is no difference to `std::atomic` in the standard operations on recent compilers.