On Wed, Dec 3, 2014 at 10:48 PM, Benedek Thaler
Hi All, Niall,
1) I was reading this Intel paper [0], and this section grabbed my attention:
"One common mistake made by developers developing their own spin-wait loops is attempting to spin on an atomic instruction instead of spinning on a volatile read. Spinning on a dirty read instead of attempting to acquire a lock consumes less time and resources. This allows an application to only attempt to acquire a lock only when it is free."
As I can tell by looking at the source code, spinlock spins on atomic consume. I wonder if a volatile read would produce better performance characteristic?
Generally speaking, things are more complicated than that. First, you would probably be spinning with a relaxed read, not consume, which is promoted to acquire on most, if not all, platforms. Acquire memory ordering is not required for spinning, and on architectures that support it it can be much more expensive than relaxed. Second, even a relaxed atomic read is formally not equivalent to a volatile read. The latter is not guaranteed to be atomic. Lastly, on x86 all this is mostly moot because compilers typically generate small volatile reads as a single instruction, which is equivalent to an acquire or relaxed atomic read on this architecture, as long as alignment is correct.
2) AFAIK spinlocking is not necessarily fair on a NUMA architecture. Is there something already implemented or planned in Boost.Spinlock to ensure fairness? I'm thinking of something like this: [1]
I can't tell for Boost.Spinlock (do we have that library?), but IMHO when you need fairness, spinlocks are not the best choice.