[interprocess] atomic_write32 on GCC/x86/x68_64 lacks memory barrier
Hi! In a piece of legacy code I've had a few uses of the atomic operations inside Boost.Interprocess (yes, I know they're internal, but I had no other choice at the time, so sue me :-). I've been debugging a thread synchronization issue which ended up in me actually looking for a problem inside interprocess, and I believe that I found the error. The atomic_write32 function, when running gcc on intel platforms does not have a memory barrier, which causes my threads to not see changes that they should see. A little bit of googling gave me the following link http://boost.2283326.n4.nabble.com/interprocess-atomic-write32-td4173289.htm..., which discusses exactly this issue. I've now fixed my code to use Boost.Atomic instead of relying on boost interprocess internals, so I'm good, but I worry that the issue might affect boost.interprocess as well. So I propose that atomic_write32 should be implemented using atomic_cas32. Or indeed that Boost.Atomic should be used instead, which is more work, obviously. Thoughts? /Lars
Lars Hagström wrote:
Hi!
In a piece of legacy code I've had a few uses of the atomic operations inside Boost.Interprocess (yes, I know they're internal, but I had no other choice at the time, so sue me :-). I've been debugging a thread synchronization issue which ended up in me actually looking for a problem inside interprocess, and I believe that I found the error. The atomic_write32 function, when running gcc on intel platforms does not have a memory barrier, which causes my threads to not see changes that they should see.
A little bit of googling gave me the following link http://boost.2283326.n4.nabble.com/interprocess-atomic-write32-td4173289.htm..., which discusses exactly this issue.
I've now fixed my code to use Boost.Atomic instead of relying on boost interprocess internals, so I'm good, but I worry that the issue might affect boost.interprocess as well.
So I propose that atomic_write32 should be implemented using atomic_cas32.
There is no reason to implement atomic writes with CAS on x86. If atomic_write32 is supposed to be sequentially consistent, it should be implemented with XCHG. If it's only needed to have release semantics, an ordinary write should be used. All writes have release semantics on x86. The problem with the current implementation is not the lack of a hardware memory barrier, but the lack of a compiler memory barrier. On GCC, it's __asm__ __volatile__ ( "" ::: "memory" ). For release semantics, it ought to go before the store. Nowadays on GCC it might be best to use the built-in atomic intrinsics: https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/_005f_005fatomic-Builtins.html std::atomic would be even better but I don't think it works in C++03 mode.
El 08/08/2014 14:58, Peter Dimov escribió:
The problem with the current implementation is not the lack of a hardware memory barrier, but the lack of a compiler memory barrier. On GCC, it's __asm__ __volatile__ ( "" ::: "memory" ). For release semantics, it ought to go before the store.
Thanks. That might be the case in many other platforms provided by users (AIX, OSF on Alpha) where the atomic write is implemented without any barrier.
Nowadays on GCC it might be best to use the built-in atomic intrinsics:
https://gcc.gnu.org/onlinedocs/gcc-4.9.1/gcc/_005f_005fatomic-Builtins.html
std::atomic would be even better but I don't think it works in C++03 mode.
Interprocess supports quite old GCCs (from 4.0, I think) that might not have atomic builtins. I expect Boost.Atomic to provide assembler code for those platforms. Best, Ion
Ion Gaztañaga wrote:
El 08/08/2014 14:58, Peter Dimov escribió:
The problem with the current implementation is not the lack of a hardware memory barrier, but the lack of a compiler memory barrier. On GCC, it's __asm__ __volatile__ ( "" ::: "memory" ). For release semantics, it ought to go before the store.
Thanks. That might be the case in many other platforms provided by users (AIX, OSF on Alpha) where the atomic write is implemented without any barrier.
On POWER/PPC and Alpha, you need an actual hardware memory barrier there for acquire/release semantics. You do have __MB() on Alpha. The PPC code has none, so unless the compiler inserts those automatically on volatile reads/writes, the code doesn't seem correct. (The x86 code is also missing compiler barriers on load.) But you don't document the memory ordering that your read/write primitives guarantee, so it's hard to say whether they are correct or not. As written, they are "relaxed".
El 08/08/2014 17:42, Peter Dimov escribió:
But you don't document the memory ordering that your read/write primitives guarantee, so it's hard to say whether they are correct or not. As written, they are "relaxed".
Right. It's time to fix this mess ;-) Best, Ion
El 08/08/2014 14:28, Lars Hagström escribió:
So I propose that atomic_write32 should be implemented using atomic_cas32. Or indeed that Boost.Atomic should be used instead, which is more work, obviously.
[CC-ing Andrey] I'd like to use Boost.Atomic (assuming Boost.Atomic can be used in C++03 mode). At least in all platforms where Boost.Atomic is available. The Boost.Interprocess atomics are broken and I don't have the skills to write correct code. I'd like to use a header-only solution but I see that's not currently possible with Boost.Atomic. I'd only use 32 bit atomic operations and only in platforms where those are lock-free, as it's the case now wirh Interprocess (we can't fallback to std::mutex-es and that's perfectly acceptable). Maybe it's a good task for Boost 1.57. Andrey, do you think porting to Boost.Atomic is a good idea? I don't know if Boost.Atomic supports all platforms Interprocess supports now (see www.boost.org/doc/libs/release/boost/interprocess/detail/atomic.hpp for details). Best, Ion
On Fri, Aug 8, 2014 at 7:11 PM, Ion Gaztañaga
El 08/08/2014 14:28, Lars Hagström escribió:
So I propose that atomic_write32 should be implemented using atomic_cas32. Or indeed that Boost.Atomic should be used instead, which is more work, obviously.
[CC-ing Andrey]
I'd like to use Boost.Atomic (assuming Boost.Atomic can be used in C++03 mode). At least in all platforms where Boost.Atomic is available. The Boost.Interprocess atomics are broken and I don't have the skills to write correct code.
I'd like to use a header-only solution but I see that's not currently possible with Boost.Atomic. I'd only use 32 bit atomic operations and only in platforms where those are lock-free, as it's the case now wirh Interprocess (we can't fallback to std::mutex-es and that's perfectly acceptable).
Maybe it's a good task for Boost 1.57. Andrey, do you think porting to Boost.Atomic is a good idea? I don't know if Boost.Atomic supports all platforms Interprocess supports now (see www.boost.org/doc/libs/release/boost/interprocess/detail/atomic.hpp for details).
I think Boost.Atomic should provide the common implementation of atomic ops for Boost libraries, so I agree that this would be a step in the right direction. However there are a few things to consider: - Looking at your current implementation, I think some platforms are not directly supported by Boost.Atomic. That is, I don't test for e.g. __IBMCPP__ or availability of Solaris functions in Boost.Atomic, so unless these platforms are supported by other backends (like gcc asm based, for example) these platforms will fall back to the lock-based backend. The list of the supported lock-free platforms can be inferred from the boost/atomic/detail/platform.hpp file (or the list of caps_*.hpp files there). Needless to say, I'm all for improving portability of Boost.Atomic, and I'll appreciate any help in supporting these platforms. - The lock-based backend requires linking to a compiled library and is not capable of interprocess synchronization. You can use Boost.Atomic 1.56 and later in header-only manner, if you test for the library capabilities before including and using atomic<> (include boost/atomic/capabilities.hpp for that). But if there is no lock-free backend for a given platform there needs to be some alternative implementation in Boost.Interprocess. Or Boost.Interprocess should fail to compile on that platform. - The underlying storage type of atomic<> is not officially fixed for a given element type and may differ from one platform to another (including from one compiler to another on the same target hardware). For example, some versions of MSVC don't support 8 and 16-bit interlocked intrinsics and Boost.Atomic uses 32-bit atomic ops instead. Not sure how critical this is for Boost.Interprocess.
El 08/08/2014 18:15, Andrey Semashev wrote:
I think Boost.Atomic should provide the common implementation of atomic ops for Boost libraries, so I agree that this would be a step in the right direction. However there are a few things to consider:
Ok. Glad we agree.
- Looking at your current implementation, I think some platforms are not directly supported by Boost.Atomic. That is, I don't test for e.g. __IBMCPP__ or availability of Solaris functions in Boost.Atomic, so unless these platforms are supported by other backends (like gcc asm based, for example) these platforms will fall back to the lock-based backend. The list of the supported lock-free platforms can be inferred from the boost/atomic/detail/platform.hpp file (or the list of caps_*.hpp files there). Needless to say, I'm all for improving portability of Boost.Atomic, and I'll appreciate any help in supporting these platforms.
I think maintainers of those compilers and platforms should help us supporting atomic operations. Interprocess can only require BOOST_ATOMIC_INT32_LOCK_FREE to be 1 after including boost/atomic/capabilities.hpp to test if needed atomics are available. In this case, and if c++11 atomics are defined Boost.Interprocess could try to use std atomics in the compiler supports it, but let compiler experts write the assembler for those platforms.
- The lock-based backend requires linking to a compiled library and is not capable of interprocess synchronization. You can use Boost.Atomic 1.56 and later in header-only manner, if you test for the library capabilities before including and using atomic<> (include boost/atomic/capabilities.hpp for that). But if there is no lock-free backend for a given platform there needs to be some alternative implementation in Boost.Interprocess. Or Boost.Interprocess should fail to compile on that platform.
I think it should fail to compile as it does now. Let's hope other compiler maintainers provide atomic types. It would be nice if Boost.Atomic used C++11 atomics in case they were available, that would make Boost libraries easier to maintain, but for now Boost.Interprocess can use C++11 if available.
- The underlying storage type of atomic<> is not officially fixed for a given element type and may differ from one platform to another (including from one compiler to another on the same target hardware). For example, some versions of MSVC don't support 8 and 16-bit interlocked intrinsics and Boost.Atomic uses 32-bit atomic ops instead. Not sure how critical this is for Boost.Interprocess.
Not critical, all atomics are now only 32 bit. Thanks for your quick reply. Best, Ion
On Fri, Aug 8, 2014 at 9:21 PM, Ion Gaztañaga
El 08/08/2014 18:15, Andrey Semashev wrote:
- Looking at your current implementation, I think some platforms are not directly supported by Boost.Atomic. That is, I don't test for e.g. __IBMCPP__ or availability of Solaris functions in Boost.Atomic, so unless these platforms are supported by other backends (like gcc asm based, for example) these platforms will fall back to the lock-based backend. The list of the supported lock-free platforms can be inferred from the boost/atomic/detail/platform.hpp file (or the list of caps_*.hpp files there). Needless to say, I'm all for improving portability of Boost.Atomic, and I'll appreciate any help in supporting these platforms.
I think maintainers of those compilers and platforms should help us supporting atomic operations. Interprocess can only require BOOST_ATOMIC_INT32_LOCK_FREE to be 1 after including boost/atomic/capabilities.hpp to test if needed atomics are available.
Correction, the macro value should be 2, not 1.
It would be nice if Boost.Atomic used C++11 atomics in case they were available, that would make Boost libraries easier to maintain, but for now Boost.Interprocess can use C++11 if available.
Boost.Atomic is planned to have a more extended feature set than std::atomic provides, so I can't use it.
- The underlying storage type of atomic<> is not officially fixed for a given element type and may differ from one platform to another (including from one compiler to another on the same target hardware). For example, some versions of MSVC don't support 8 and 16-bit interlocked intrinsics and Boost.Atomic uses 32-bit atomic ops instead. Not sure how critical this is for Boost.Interprocess.
Not critical, all atomics are now only 32 bit. Thanks for your quick reply.
I used MSVC as an example. In theory, it is possible that some platform only provides 64-bit atomics and Boost.Atomic has to emulate 32-bit ops. I don't know if such a platform really exists though.
I'd be happy to help out with testing any work on moving boost interprocess
to boost atomics. It would be good to get this cleaned up!
Anyway, for the Boost versions that are already out there I do need a
working patch that I can provide with my software. I've tried to have a
look at what Interprocess uses the atomic_write32 function for, to see if
release semantics is enough, but my atomic-fu is not strong enough for that.
So unless Ion tells me otherwise I am going to assume that sequential
consistency is needed and use the following implementation of
atomic_write32 (for gcc and x86/x86_64):
inline void atomic_write32(volatile boost::uint32_t *mem, boost::uint32_t
val)
{
__asm__ __volatile__ ( "" ::: "memory" );
__asm__ (
"xchgl %0, %1"
: "+r" (val), "+m" (*mem)
);
__asm__ __volatile__ ( "" ::: "memory" );
}
As is probably obvious, I am no assembly programmer, I just copied and
pasted relevant bits from Boost.Atomic, and this *appears* to work.
Thanks
Lars
On Fri, Aug 8, 2014 at 7:48 PM, Andrey Semashev
On Fri, Aug 8, 2014 at 9:21 PM, Ion Gaztañaga
wrote: El 08/08/2014 18:15, Andrey Semashev wrote:
- Looking at your current implementation, I think some platforms are not directly supported by Boost.Atomic. That is, I don't test for e.g. __IBMCPP__ or availability of Solaris functions in Boost.Atomic, so unless these platforms are supported by other backends (like gcc asm based, for example) these platforms will fall back to the lock-based backend. The list of the supported lock-free platforms can be inferred from the boost/atomic/detail/platform.hpp file (or the list of caps_*.hpp files there). Needless to say, I'm all for improving portability of Boost.Atomic, and I'll appreciate any help in supporting these platforms.
I think maintainers of those compilers and platforms should help us supporting atomic operations. Interprocess can only require BOOST_ATOMIC_INT32_LOCK_FREE to be 1 after including boost/atomic/capabilities.hpp to test if needed atomics are available.
Correction, the macro value should be 2, not 1.
It would be nice if Boost.Atomic used C++11 atomics in case they were available, that would make Boost libraries easier to maintain, but for now Boost.Interprocess can use C++11 if available.
Boost.Atomic is planned to have a more extended feature set than std::atomic provides, so I can't use it.
- The underlying storage type of atomic<> is not officially fixed for a given element type and may differ from one platform to another (including from one compiler to another on the same target hardware). For example, some versions of MSVC don't support 8 and 16-bit interlocked intrinsics and Boost.Atomic uses 32-bit atomic ops instead. Not sure how critical this is for Boost.Interprocess.
Not critical, all atomics are now only 32 bit. Thanks for your quick reply.
I used MSVC as an example. In theory, it is possible that some platform only provides 64-bit atomics and Boost.Atomic has to emulate 32-bit ops. I don't know if such a platform really exists though.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Lars Hagström wrote:
inline void atomic_write32(volatile boost::uint32_t *mem, boost::uint32_t val) { __asm__ __volatile__ ( "" ::: "memory" ); __asm__ ( "xchgl %0, %1" : "+r" (val), "+m" (*mem) ); __asm__ __volatile__ ( "" ::: "memory" ); }
This will work, but it's more verbose than it needs to be. You can put the __volatile__ and "memory" on the xchg asm statement and take out the two others. You'll also need to fix atomic_read32: { uint32 r = *mem; __asm__ __volatile__ ( "" ::: "memory" ); return r; }
Thanks!
As I said, I'm not an assembly programmer. I can copy/paste though, which
lead to the verbosity. :-)
On Mon, Aug 11, 2014 at 2:34 PM, Peter Dimov
Lars Hagström wrote:
inline void atomic_write32(volatile boost::uint32_t *mem, boost::uint32_t
val) { __asm__ __volatile__ ( "" ::: "memory" ); __asm__ ( "xchgl %0, %1" : "+r" (val), "+m" (*mem) ); __asm__ __volatile__ ( "" ::: "memory" ); }
This will work, but it's more verbose than it needs to be. You can put the __volatile__ and "memory" on the xchg asm statement and take out the two others.
You'll also need to fix atomic_read32:
{ uint32 r = *mem;
__asm__ __volatile__ ( "" ::: "memory" ); return r;
}
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost
Ok, I've managed, through trial and error and a bit of googling to make it
less verbose. Is this correct?
inline boost::uint32_t atomic_read32(volatile boost::uint32_t *mem)
{
const boost::uint32_t val = *mem;
asm volatile ( "" ::: "memory" );
return val;
}
inline void atomic_write32(volatile boost::uint32_t *mem, boost::uint32_t
val)
{
asm volatile
(
"xchgl %0, %1"
: "+r" (val), "+m" (*mem)
:: "memory"
);
}
Does that "memory" statement at the end add a fence both before and after
the assembly statement? Or have I misunderstood how this works?
/Lars
On Mon, Aug 11, 2014 at 3:21 PM, Lars Hagström
Thanks!
As I said, I'm not an assembly programmer. I can copy/paste though, which lead to the verbosity. :-)
On Mon, Aug 11, 2014 at 2:34 PM, Peter Dimov
wrote: Lars Hagström wrote:
inline void atomic_write32(volatile boost::uint32_t *mem,
boost::uint32_t val) { __asm__ __volatile__ ( "" ::: "memory" ); __asm__ ( "xchgl %0, %1" : "+r" (val), "+m" (*mem) ); __asm__ __volatile__ ( "" ::: "memory" ); }
This will work, but it's more verbose than it needs to be. You can put the __volatile__ and "memory" on the xchg asm statement and take out the two others.
You'll also need to fix atomic_read32:
{ uint32 r = *mem;
__asm__ __volatile__ ( "" ::: "memory" ); return r;
}
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost
Hmm, should I add a fence to the Windows atomic_read32 variant too?
Reading the Boost.Atomics code it looks like it puts a _ReadWriteBarrier in
there...
/Lars
On Mon, Aug 11, 2014 at 3:38 PM, Lars Hagström
Ok, I've managed, through trial and error and a bit of googling to make it less verbose. Is this correct?
inline boost::uint32_t atomic_read32(volatile boost::uint32_t *mem) { const boost::uint32_t val = *mem; asm volatile ( "" ::: "memory" ); return val;
}
inline void atomic_write32(volatile boost::uint32_t *mem, boost::uint32_t val) { asm volatile
( "xchgl %0, %1" : "+r" (val), "+m" (*mem) :: "memory" ); }
Does that "memory" statement at the end add a fence both before and after the assembly statement? Or have I misunderstood how this works?
/Lars
On Mon, Aug 11, 2014 at 3:21 PM, Lars Hagström
wrote: Thanks!
As I said, I'm not an assembly programmer. I can copy/paste though, which lead to the verbosity. :-)
On Mon, Aug 11, 2014 at 2:34 PM, Peter Dimov
wrote: Lars Hagström wrote:
inline void atomic_write32(volatile boost::uint32_t *mem,
boost::uint32_t val) { __asm__ __volatile__ ( "" ::: "memory" ); __asm__ ( "xchgl %0, %1" : "+r" (val), "+m" (*mem) ); __asm__ __volatile__ ( "" ::: "memory" ); }
This will work, but it's more verbose than it needs to be. You can put the __volatile__ and "memory" on the xchg asm statement and take out the two others.
You'll also need to fix atomic_read32:
{ uint32 r = *mem;
__asm__ __volatile__ ( "" ::: "memory" ); return r;
}
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost
Lars Hagström wrote:
Hmm, should I add a fence to the Windows atomic_read32 variant too? Reading the Boost.Atomics code it looks like it puts a _ReadWriteBarrier in there...
That would be a good idea as well. Some MSVC versions give volatile reads acquire semantics, and some do not, so it's safer to have the barrier.
Lars Hagström wrote:
Does that "memory" statement at the end add a fence both before and after the assembly statement?
Effectively. To be more precise, it turns the assembly statement into a compiler fence. "memory" tells the compiler to assume that the assembly code accesses and modifies memory in unpredictable ways, and the effect of that is basically a compiler fence. You should probably use the __asm__ and __volatile__ spellings as a matter of style though.
Thanks very much for your help! I've learnt a lot today :-)
On Mon, Aug 11, 2014 at 4:17 PM, Peter Dimov
Lars Hagström wrote:
Does that "memory" statement at the end add a fence both before and after the assembly statement?
Effectively. To be more precise, it turns the assembly statement into a compiler fence. "memory" tells the compiler to assume that the assembly code accesses and modifies memory in unpredictable ways, and the effect of that is basically a compiler fence.
You should probably use the __asm__ and __volatile__ spellings as a matter of style though.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/ mailman/listinfo.cgi/boost
El 11/08/2014 16:53, Lars Hagström escribió:
Thanks very much for your help! I've learnt a lot today :-)
Thanks for your time Lars, could you provide a patch with the modifications you've done? Just in case Boost.Atomic port is not in time for Boost 1.57... Best, Ion
On 12/08/2014 02:17, Peter Dimov wrote:
You should probably use the __asm__ and __volatile__ spellings as a matter of style though.
It's not just style. When in strict standards mode, "asm volatile" will fail to compile, but "__asm__ __volatile__" will still work.
Thanks for your time Lars, could you provide a patch with the modifications you've done? Just in case Boost.Atomic port is not in time for Boost 1.57...
Here is my patch. I also sent you a pull request on github, in case that
makes life easier...
I also updated the versions of atomic_read32 and atomic_write32 that use
gcc compiler intrinsics. However, I have not made any changes to the
plaforms that I cannot test.
Cheers
/Lars
diff --git a/include/boost/interprocess/detail/atomic.hpp
b/include/boost/interprocess/detail/atomic.hpp
index 11992d1..60993ae 100644
--- a/include/boost/interprocess/detail/atomic.hpp
+++ b/include/boost/interprocess/detail/atomic.hpp
@@ -53,6 +53,9 @@ inline boost::uint32_t atomic_cas32
#include
participants (5)
-
Andrey Semashev
-
Gavin Lambert
-
Ion Gaztañaga
-
Lars Hagström
-
Peter Dimov