Paul
wrote: good lord, assembler? doesn't the 'volatile' keyword fix the problem you describe?
I'm afraid it doesn't. You might want to check the archives for mentions regarding the double checked locking issues which highlight the issue, or read the paper http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf
if the flag is going to be flipping back and forth, then sure, even with memory barriers you will have threads which think the value is different... eg, you have 2 long-running threads that check the value and do some calculation that take 5 seconds or so. lets assume they dont sync when reading the flag, so then thread A could check the flag and read false, while thread B checked the flag 4 seconds ago and currently thinks its true. so you have inconsistencies there anyway, right?
It depends, there are lock free ways to do things. The important thing is that after the memory barrier, everyone is guaranteed to be on the same pages w.r.t. the value when they next check it.
otherwise, the flag is set and its only a matter of time before the CPU sees the correct value ('die' in the previous email's case) and dies. if the flag is for protecting resources, then of course you need volatile/guards/etc, otherwise it can be just a 'lazy cancel', when the thread finally reads the right value, it quits. how many cycles-lag would that be anyway? is this only a problem on exotic platforms?
I have found it to be a big issue for my work on a simple win32 dual proc box. It can be a surprisingly long time before a value propagates. I'm not entirely certain the value is guaranteed to ever propagate, which is an issue. Another example, checking whether a message queue is empty can be significantly faster ( 2 to 4 times on ia32 ) with the appropriate use of memory barriers rather than a mutex / critical section.
is this cpu-cache problem actually a problem in this 'kill-flag' case, do you really think you need memory barriers on a flag like this?
On a single IA32 processor, no. On a dual processor, maybe, it depends. Also, I'm not sure about the guarantee of the value propagating eventually, I have encountered cases where it does seem to propagate at all probably due to the compiler optimizations. I've been meaning to submit an interface for memory fences based on the work of others I've seen poking around. It seems that to cope with most architectures you need stuff like: load_load, load_store, store_store, store_load barriers and the like to support memory architecture with more relaxed semantics, some of which are no-ops on IA32. This should be a fundamental building block on which boost can build. It would also mean having a different category of builds for such a library, as a compilation would be based on only on the os, compiler and stl, but the hardware architecture as well. For example, pre lfence, sfence and mfence on ia32 you had to do a cpuid as a memory fence. Though a locked bus instruction might have been enough, I think. So an implementation even just on plain ia32 would have to have different architecture #defines or for different generational ia32s. The later fences came with SSE2 for example. Check out http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1680.pdf for some more insight from cleverer people than I. Regards, Matt. matthurd@acm.org