Re: [Boost-users] the performance of boost::lock_free is slow in centos 6 and 7

8 Jul 2016

      I suspect the main issue with your test is the effects of false sharing
being magnified when the number of cores is >= the number of threads.

I'd also suggest using the mm_pause intrinsic when busy spinning if your
CPU supports it. (It's a real shame there's no official spinlock class in
C++)

Also, use something like a countdown latch to make sure your threads all
start the actual work at the same time.

On Fri, Jul 8, 2016 at 1:50 PM, Michael <mwpowellhtx@gmail.com> wrote:
...
On July 8, 2016 1:39:13 AM EDT, gao1738@sina.com wrote:
...
Hi all,
I try the boost::lockfree::queue and find some performance issue：
I use the following test programs:
lock_free_test.cc
#include <boost/thread/thread.hpp>
#include <boost/lockfree/queue.hpp>
#include <iostream>
#include<cstdio>
#include <boost/atomic.hpp>
boost::atomic_int producer_count(0);
boost::atomic_int consumer_count(0);
boost::lockfree::queue<int> queue(128);
const int iterations = 1000000;
const int producer_thread_count = 4;
const int consumer_thread_count = 4;
void producer(void)
{
 for (int i = 0; i != iterations; ++i) {
   int value = ++producer_count;
   while (!queue.push(value))
     ;
 }
}
boost::atomic<bool> done (false);
void consumer(void)
{
 int value;
 while (!done) {
   while (queue.pop(value))
     ++consumer_count;
 }
while (queue.pop(value))
   ++consumer_count;
}
int main(int argc, char* argv[])
{
 using namespace std;
 cout << "boost::lockfree::queue is ";
 if (!queue.is_lock_free())
   cout << "not ";
 cout << "lockfree" << endl;
boost::thread_group producer_threads, consumer_threads;//线程组
for (int i = 0; i != producer_thread_count; ++i)
   producer_threads.create_thread(producer);
for (int i = 0; i != consumer_thread_count; ++i)
   consumer_threads.create_thread(consumer);
producer_threads.join_all();
 done = true;
consumer_threads.join_all();
cout << "produced " << producer_count << " objects." << endl;
 cout << "consumed " << consumer_count << " objects." << endl;
}
locktest.cc
#include <boost/thread/thread.hpp>
#include <boost/lockfree/queue.hpp>
#include <iostream>
#include<cstdio>
#include <queue>
#include <boost/atomic.hpp>
using namespace std;
boost::mutex producer_count_mu;
boost::mutex consumer_count_mu;
int producer_count = 0;
int consumer_count = 0;
std::queue<int> message_queue;
boost::mutex queue_mutex;
const int iterations = 1000000;
const int producer_thread_count = 4;
const int consumer_thread_count = 4;
void producer(void)
{
 for (int i = 0; i != iterations; ++i) {
   queue_mutex.lock();
   int value = ++producer_count;
   message_queue.push(value);
   queue_mutex.unlock();
 }
}
I haven't used lockfree per se but my understanding is that it solves what
its name says.
My guess is that most of the time is spent contending for the mutex.
Incidentally, why not use one of the proper lock classes? You are already
using boost, so this is also there. That'll save you having to lock and
unlock, at least.
I haven't explored lockfree that much, I could be wrong, but I thought the
whole point of running lockfree was to avoid expensive locks, but not
absolving you of being aware of exhausted conditions when your queue was
empty.
Also, doing a test like this what are you really asserting? Lock free; not
expense free. There are no free lunches. Less so ever before.
Anyhow, HTH
Regards,
Michael Powell
...
bool done (false);
void consumer(void)
{
 int value;
 while (!done) {
   queue_mutex.lock();
   while (!message_queue.empty()) {
     message_queue.pop();
     ++consumer_count;
   }
   queue_mutex.unlock();
 }
queue_mutex.lock();
 while (!message_queue.empty()) {
   message_queue.pop();
   ++consumer_count;
 }
 queue_mutex.unlock();
}
int main(int argc, char* argv[])
{
 using namespace std;
 cout << "boost::lockfree::queue is ";
//  if (!queue.is_lock_free())
   cout << "not ";
 cout << "lockfree" << endl;
boost::thread_group producer_threads, consumer_threads;//线程组
for (int i = 0; i != producer_thread_count; ++i)
   producer_threads.create_thread(producer);
for (int i = 0; i != consumer_thread_count; ++i)
   consumer_threads.create_thread(consumer);
producer_threads.join_all();
 done = true;
consumer_threads.join_all();
cout << "produced " << producer_count << " objects." << endl;
 cout << "consumed " << consumer_count << " objects." << endl;
}
The compile command is:
g++ -I/usr/local/inlcude -L/usr/local/lib lock_free_test.cc
-lboost_thread -lboost_system -o lock_free_test
g++ -I/usr/local/inlcude -L/usr/local/lib lock_test.cc -lboost_thread
-lboost_system -o lock_test
1. I first test in on my work computer, which use ubuntu 14.04 with
2core(i5), with
boost version: 1.54
gcc version: 4.8.4
g++ version: 4.8.4
The test result is that:
time ./lock_test
boost::lockfree::queue is not lockfree
produced 4000000 objects.
consumed 4000000 objects.
real    0m3.844s
user    0m1.800s
sys    0m12.308s
time ./lock_free_test
boost::lockfree::queue is lockfree
produced 4000000 objects.
consumed 4000000 objects.
real    0m1.745s
user    0m6.886s
sys    0m0.000s
We can see that the lock free solution has better performance, about
50%.
2. then I test it in a PC server with centos 6.4 , and 8 core (CPU
Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz )
boost version: 1.54
gcc version: 4.4.7
g++ version:4.4.7
The test result is that:
time ./lock_test
boost::lockfree::queue is not lockfree
produced 4000000 objects.
consumed 4000000 objects.
real    0m3.900s
user    0m2.593s
sys    0m27.282s
time ./lock_free_test
boost::lockfree::queue is lockfree
produced 4000000 objects.
consumed 4000000 objects.
real    0m5.470s
user    0m43.105s
sys    0m0.000s
Non lock free solution is better than lock free solution.
3. I test it in a better PC server with centos 7.1 and 32 core CPU
(Intel(R) Xeon(R) CPU E7-4820 v2 @ 2.00GHz)
boost version: 1.53
gcc version: 4.8.3
g++ version: 4.8.3
time ./lock_test
boost::lockfree::queue is not lockfree
produced 4000000 objects.
consumed 4000000 objects.
real    0m3.023s
user    0m1.929s
sys    0m20.706s
time ./lock_free_test
boost::lockfree::queue is lockfree
produced 4000000 objects.
consumed 4000000 objects.
real    0m9.804s
user    1m14.900s
sys    0m0.100s
The lock free solution will be 3 times lower than the non-lock free
solution!
My question is that:
1. why lock free solution will get better performance in ubuntu but
much slower in centos 6 and 7?
   Is it the issue of kernal or the gcc version or the boost version?
The more cpu in the machine the worse performance for lock free
solution?
2. In which case, we should use the boost lock free solution to get
better performance?
Best Regards!
dennis
------------------------------------------------------------------------
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Re: [Boost-users] the performance of boost::lock_free is slow in centos 6 and 7

james