On 02.04.20 20:59, Mathias Gaunard wrote:
On Thu, 2 Apr 2020 at 17:43, Jan Hafer via Boost
wrote: Yes, I do use 1 buffer/queue per thread.
So you're saying that circular_buffer is slower on a given thread when other threads are accessing their own circular_buffer in parallel? That sounds unlikely to be circular buffer's fault.
Yes and I dont know quite the reason for it. My Threads know their id to access a file-global data structure containing their queue/circular buffer. They start another after in a thread-safe way and exit on emptying the queue/circular buffer.
The first thing to check is that the memory placement of your circular buffer does not result in false sharing with other threads. More generally you could be stalling due to any inter-connection, such as crossing NUMA domains or talking to hardware. Otherwise you could be saturating the execution ports of your core and thus not get the expected speed-up from hyper-threading, but in any case you'd be the best throughput overall.
I know how to obtain the source code of boost, but the compiler vendors sadly provide no direct internet-searchable links to the std source code. Thanks for your advice. I may dig further with perf to obtain cache-miss statistics. My goal was just to inform you of this characteristics, since this is not documented.