Re: [Boost-users] lockfree::queue restrictions and performance on linux

13 Jan 2014

      On 28/12/2013 23:50, Quoth Leon Mlakar:
...
One of the libraries that got my attentions is lockfree, the
lockfree::queue in particular, since it almost completely meets my
needs. Except that I've been wondering why is it that it places the
requirement for the trivial destructor on the items it stores? I mean,
that really reduces its usability. Is this something inherent to its
design? I guess that this has been discussed during the review but a
quick digging through the archives didn't come up anything on this. The
reason for asking is that I have forced the queue to accept
std::shared_ptr and this seems to work almost fine, except that there
seem to be an extra reference left to the item last popped, but that's
something I can live with.
It's definitely unsafe to violate those requirements.  I asked the 
maintainer the same question and after he mentioned why it was there I 
confirmed the issue when I looked at the code.

Basically there are cases when the value can get copied multiple times, 
or copied after the destructor is called, and cases when the destructor 
is never called at all.  This breaks pretty much anything that isn't a 
simple POD type.

If you want to store a shared_ptr in there, probably the safest thing to 
do is to store a shared_ptr* instead.  Note that this does mean your 
producer will probably have to make a new copy of the const shared_ptr& 
that you give it, and the consumer and destructor will have to delete 
these extra copies.

(If you don't *really* need the memory to be shared ownership -- eg. you 
know that the producer will create new items and they'll only be 
potentially shared after the consumer pops them, then you'll get better 
performance by queuing a T or T* and only constructing the shared_ptr<T> 
in the consumer.)
...
Another thing was its performance - I ran a sort of quick benchmark
(queue of shared_ptr's to item containing atomic<int> as a counter,
items were popped from the queue and immediately pushed back after
decrementing the counter) and its perfromance was next to stellar on OSX
(with multiple consumers and producers something like forty times faster
than std::queue with mutex) but very poor on Linux. While I can
understand that locking runs fast on Linux due to its spinlock/futex
implementation of mutexes, I have no explanation for the
poor performance of lockless::queue (with two producer/consumer threads
std::queue with mutex was about 15 times faster and about 20 times
slower than when run on OSX on the similar hardware). Did anybody else
observe the same poor performance behavior? I used Boost 1.54 and tested
with both gcc 4.8 and clang 3.5 and got similar results with both.
Did you check sizeof(shared_ptr<T>) and .is_lock_free()?  Depending on 
memory alignment and the size of your queue payload objects it might be 
falling back on a lock-based implementation.  Also, try Boost 1.55. 
There were quite a lot of changes between 1.54 and 1.55 to Boost.Atomic.

Also note that lock-free tends to be slower on average than lock-based 
(particularly for the multi-producer/consumer-safe implementations), 
unless there's a fair bit of contention.  So it's not always the best 
choice, depending on what you're doing with it.

Re: [Boost-users] lockfree::queue restrictions and performance on linux

Gavin Lambert