Volatile and threading/barriers are 2 different things, which only appear related. volatile is for dealing with the compiler, barriers are for dealing with the CPU(s).
The reason that data_ready is NOT 'left' in a register is because a function with unknown side-effects (cond.wait) is being called. You could just as easily have:
while (!data_ready) some_library_function();
and the compiler will ensure that data_ready is re-read (ie not left in a register) because it is a global, and the compiler is unsure whether or not some_library_function() modifies data_ready or not. In theory the compiler could go look into some_library_function() and figure that out, but in practice it doesn't.
As for memory barriers, cond.wait() locks and unlocks the mutex, which puts in the necessary memory barriers. Note that the order of lock/unlock/relock is such that data_ready is only read while holding the lock. Of course, nothing here says whether the other thread *wrote* data_ready inside a lock or with the necessary release-barrier, but let's assume that it did.
So recap:
volatile is not needed in that example because access is protected by a mutex.
No, I'd say volatile is not needed because functions with unknown side-effects are being called (and/or if the mutex code was somehow magically inlined, then we can assume the compiler recognizes the memory-barrier intrinsics and forces memory re-reads because of that).
And
It is possible that the cond.wait may introduce a memory barrier that forces the cache among multiple CPU's to sync up
- cond.wait DOES introduce a memory barrier. Probably 2 - a release on unlock(mutex) and an acquire on lock(mutex) - it is not really 'cache syncing' that is the problem (most CPUs have cache-coherency guarantees) it is the relative ordering of reads and writes (and RE-ordering by the CPU / memory bus) that causes the 'visibility' problems.
Thank you for the very clear and concise explanation! -Gabe