Is there any interest in an alternate implementation of a lockfree queue that doesn't use compare_exchange (assuming it is correct, or can be made to be correct)? I have a multi-producer / multi-consumer implementation that generally doesn't suffer nearly as much from contention. On my hardware (an older high-end 64 bit x86 workstation) I get roughly comparable throughput with existing boost lockfree queue with 1 producer and 1 consumer. With 4 producers and 4 consumers I get about 3x throughput gain on existing boost lockfree queue. However, this implementation has a weakness, it suffers in over subscription scenarios (threads in middle of push/pop that lose their time-slice); with twice as many threads as cores its performances is comparable to boost lockfree queue (using try_push/try_pop, otherwise throughput is down the toilet), and with 4x thread/core ratio the throughput is somewhat less than boost lockfree queue. This was tested on Windows 7 x64 with Inel Xeon X5355 x 2 (2 sockets w/ 4 cores a piece) @ 2.66 Ghz and 32 GB RAM @ 667 MHz. The implementation can be found here: https://github.com/benmccart/GuarunteedMpmcQueue/blob/master/queue/queue.hpp Is the implementation correct? Can anyone spot any race conditions, or conditions under which the implementation would fail to work as expected (other than queue is full/empty or over-subscription)? ~Ben