circular buffer in a multithreaded program
Hello! I had some troubles with using boost::circular_buffer in a multithreaded application and I'm not really sure whether I actually ran into a bug or wether I expected just too much that wasn't promised in the thread-safety section. Anyhow, I wanted to share my story and a theory of what possibly went wrong. Maybe this posting even contains some ideas of how circular_buffer could be made a little more thread-safe without ading any mutex- or atomics-based synchronization. What I tried to do: I tried to grant one thread write access to a sub-range in a circular_buffer while another thread had read-access to a non-overlapping sub-range. At no time any capacity change was needed. Occasionally, in a reader thread I would "consume" some of the values that were already "committed" by a writer and make more room for the writer to store new data by - locking a mutex - calling cb.erase_first(n); - calling cb.resize(cb.capacity()); - notify writer that there is new free space to write stuff into - unlocking the mutex As for iterator invalidation, erase_first just invalidates iterators to the elements being removed. And resize in this case just increases the size again to the capacity which should not invalidate any iterators If I understand correctly. But the program did not work. The first thing that came to my attention was that the checked iterators of circular_buffer don't support this kind of multithreading as they register and unregister themselves into a linked list without any synchronization. But even after disabling the checked iterators via BOOST_CB_DISABLE_DEBUG, the program still did not work. I could not figure out why (with enough certainty) and eventually replaced circular_buffer with a custom vector-based data structure. My guess is that the iterators of a circular_buffer somehow depend on some part of the state of the buffer that is modifed by erase_first. Then, I would have a data race. But such a dependency does not really have to be there. In my own implementation, an iterator is "self-sufficient" in the sense that it does not need to access any of the buffer's state (indices, counters and whatnot) to access the right elements. And this way, erasing some unneeded elements from the buffer while some other thread still works on other values in the buffer via iterators does not cause a data race. In my opinion, it's reasonable to expect something like this to work using a circular_buffer. Maybe someone knows more about how circular_buffer iterators are implemented and/or wants to improve the circular_buffer implementation w.r.t. thread-safety to support this kind of use. Since I switched to a custom data structure that is able to deal with this situation, I did not investiage this issue with circular_buffer any further. I already lost about a day because of this. I hope, this is of interest to somebody who wants to avoid exessive copying in the standard "producer-consumer / bounded-buffer pattern". Cheers! Sebastian
2014-04-22 15:47 GMT+04:00 Sebastian Gesemann
- locking a mutex - calling cb.erase_first(n); - calling cb.resize(cb.capacity()); - notify writer that there is new free space to write stuff into - unlocking the mutex
This is what was happening in the reader thread. What was happening in the writer thread?
But the program did not work. The first thing that came to my attention was that the checked iterators of circular_buffer don't support this kind of multithreading as they register and unregister themselves into a linked list without any synchronization.
I had exactly the same thoughts... -- Best regards, Antony Polukhin
Am 22.04.2014 20:02, schrieb Antony Polukhin:
2014-04-22 15:47 GMT+04:00 Sebastian Gesemann
: <...> - locking a mutex - calling cb.erase_first(n); - calling cb.resize(cb.capacity()); - notify writer that there is new free space to write stuff into - unlocking the mutex
This is what was happening in the reader thread. What was happening in the writer thread?
Let me back up a little ... I wrapped the circular_buffer into another object that offers this interface: range allocate(int n); // for writer void commit(int n); // for writer const_range read(int n); // for reader void consume(int n); // for reader Apart from the container that backs up the storage (which used to be a circular_buffer), it also contains - a mutex - two condition variables: - enough_writable (which allocate blocks on) - enough_readable (which read blocks on) All of these four actions include locking the mutex. The first function of a pair returns a range object (think of it as a pair of iterators) with which a thread actually reads or overwrites the data _without_ owning a lock. The second function of a pair updates the state to indicate that either new data is available for a reader or that new free space is available for the writer. The only interesting bit is the consume function which I tried to describe in my first email. Other than that, the only things happening with circular_buffer was invoking begin() and/or end() and doing some "iterator arithmetic" in allocate/read while having a lock. So, essentially, begin/end/erase_first/resize was only executed when having a lock where the last ones did not invalidate any iterators that were in use somewhere. Deref'ing of the iterators was not locked. My guess is that "working" with an iterator while concurrently doing an erase_first/resize creates a data race -- possibly because an iterator tries to read some of circular_buffer's state which is modified via erase_first and/or resize. But that's just a guess. I did not check the circular_buffer implementation too closely.
But the program did not work. The first thing that came to my attention was that the checked iterators of circular_buffer don't support this kind of multithreading as they register and unregister themselves into a linked list without any synchronization.
I had exactly the same thoughts...
This was definitely an issue. But it's not the only one apparently. Cheers! sg
2014-04-23 1:52 GMT+04:00 Sebastian Gesemann
My guess is that "working" with an iterator while concurrently doing an erase_first/resize creates a data race -- possibly because an iterator tries to read some of circular_buffer's state which is modified via erase_first and/or resize. But that's just a guess. I did not check the circular_buffer implementation too closely.
You are right. Iterator holds a pointer to circular_buffer and accesses some of its internals. However it is safe to use only iterators and overwrite independent parts of circular_buffer. All the operations with iterators do not modify internals of circular_bufer. So if you change cb.erase_first(n)/cb.resize(cb.capacity()) to iterator operations there must be no problems. Though I thinks that's what you have done in case of custom vector-based data structure. Thanks for the investigation! -- Best regards, Antony Polukhin
Hi
You are right. Iterator holds a pointer to circular_buffer and accesses some of its internals. However it is safe to use only iterators and overwrite independent parts of circular_buffer. All the operations with iterators do not modify internals of circular_bufer. So if you change cb.erase_first(n)/cb.resize(cb.capacity()) to iterator operations there must be no problems. Though I thinks that's what you have done in case of custom vector-based data structure.
This might be a general C++ question and not be boost specific, but as I read the answer above the following question popped up: If two threads read / write the same area in the memory and there is no synchronization, how is it ever guaranteed that they see the "current" values? I see the problem, if one threads writes to a certain place in a buffer and another thread reads from that buffer, it might read cached values (either by the compiler keeping a value in a register or by the hardware keeping copies in cpu caches while the two threads run in different cpu cores / cpus). So any kind of synchronization (that usually involves a memory barrier) is needed... Coming from java there is a memory model that clearly specifies a "happens-before" relation. Values are guaranteed to be available if the operations in place are ordered according to that relation. Is there any C++ specification for these things? How much is it compiler / platform specific? Regards, Steffen
Hans boehm seems to busy with this: https://www.youtube.com/watch?v=mrvAqvtWYb4&gl=BE i On Thu, Apr 24, 2014 at 9:47 AM, Steffen Heil (Mailinglisten) < lists@steffen-heil.de> wrote:
Hi
You are right. Iterator holds a pointer to circular_buffer and accesses some of its internals. However it is safe to use only iterators and overwrite independent parts of circular_buffer. All the operations with iterators do not modify internals of circular_bufer. So if you change cb.erase_first(n)/cb.resize(cb.capacity()) to iterator operations there must be no problems. Though I thinks that's what you have done in case of custom vector-based data structure.
This might be a general C++ question and not be boost specific, but as I read the answer above the following question popped up:
If two threads read / write the same area in the memory and there is no synchronization, how is it ever guaranteed that they see the "current" values? I see the problem, if one threads writes to a certain place in a buffer and another thread reads from that buffer, it might read cached values (either by the compiler keeping a value in a register or by the hardware keeping copies in cpu caches while the two threads run in different cpu cores / cpus). So any kind of synchronization (that usually involves a memory barrier) is needed...
Coming from java there is a memory model that clearly specifies a "happens-before" relation. Values are guaranteed to be available if the operations in place are ordered according to that relation. Is there any C++ specification for these things? How much is it compiler / platform specific?
Regards, Steffen
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Hi Steffen,
On Thu, Apr 24, 2014 at 9:47 AM, Steffen Heil (Mailinglisten)
This might be a general C++ question and not be boost specific, but as I read the answer above the following question popped up:
If two threads read / write the same area in the memory and there is no synchronization, how is it ever guaranteed that they see the "current" values? I see the problem, if one threads writes to a certain place in a buffer and another thread reads from that buffer, it might read cached values (either by the compiler keeping a value in a register or by the hardware keeping copies in cpu caches while the two threads run in different cpu cores / cpus). So any kind of synchronization (that usually involves a memory barrier) is needed...
True. In my case, there is mutex-based synchronization. A reader has to wait for the writer marking the new values as "ready". This insovles locking/unlocking a mutex. And I would expect that locking/unlocking a mutex also involves a kind of "memory barrier" which should solve the cache issues.
Coming from java there is a memory model that clearly specifies a "happens-before" relation. Values are guaranteed to be available if the operations in place are ordered according to that relation. Is there any C++ specification for these things? How much is it compiler / platform specific?
I also programmed in Java. From what I can tell, there is little difference between Java and C++ in this regard. The terminology is just different. What you call "monitor", "volatile" and "happens before" in Java is called "mutex", "amotic" and "sequenced before". On top of that C++11 allows you to use "weaker synchronization". But that's something I choose to ignore because it's just too easy to get it wrong and not worth the performance gain in my opinion. Cheers! Sebastian
I also programmed in Java. From what I can tell, there is little difference between Java and C++ in this regard. The terminology is just different. What you call "monitor", "volatile" and "happens before" in Java is called "mutex", "amotic" and "sequenced before". On top of that C++11 allows you to use "weaker synchronization". But that's something I choose to ignore because it's just too easy to get it wrong and not worth the performance gain in my opinion.
Cheers! Sebastian
Volatile in C/C++ is not sequenced like atomics in Java. In many cases the observed behavior on x86 platforms will be similar, but this is by chance. C++11 and boost have std/boost::atomic, which will match the behavior of Java atomics. C++11 and boost also have a std/boost::mutex implementation. Lee
On Thu, Apr 24, 2014 at 1:59 PM, Lee Clagett
I also programmed in Java. From what I can tell, there is little difference between Java and C++ in this regard. The terminology is just different. What you call "monitor", "volatile" and "happens before" in Java is called "mutex", "amotic" and "sequenced before". On top of that C++11 allows you to use "weaker synchronization". But that's something I choose to ignore because it's just too easy to get it wrong and not worth the performance gain in my opinion.
Volatile in C/C++ is not sequenced like atomics in Java. In many cases the observed behavior on x86 platforms will be similar, but this is by chance. C++11 and boost have std/boost::atomic, which will match the behavior of Java atomics. C++11 and boost also have a std/boost::mutex implementation.
Not sure whether this was meant as correction or you simply wanted to add your comment. Anyhow, I did not say anything about C++'s volatile. I tried to convey that the cases in which you would use volatile in Java are typically cases in which you would use atomics in C++.
Lee
On Thu, Apr 24, 2014 at 8:24 AM, Sebastian Gesemann
On Thu, Apr 24, 2014 at 1:59 PM, Lee Clagett
wrote: I also programmed in Java. From what I can tell, there is little difference between Java and C++ in this regard. The terminology is just different. What you call "monitor", "volatile" and "happens before" in Java is called "mutex", "amotic" and "sequenced before". On top of that C++11 allows you to use "weaker synchronization". But that's something I choose to ignore because it's just too easy to get it wrong and not worth the performance gain in my opinion.
Volatile in C/C++ is not sequenced like atomics in Java. In many cases
the
observed behavior on x86 platforms will be similar, but this is by chance. C++11 and boost have std/boost::atomic, which will match the behavior of Java atomics. C++11 and boost also have a std/boost::mutex implementation.
Not sure whether this was meant as correction or you simply wanted to add your comment. Anyhow, I did not say anything about C++'s volatile. I tried to convey that the cases in which you would use volatile in Java are typically cases in which you would use atomics in C++.
Lee
Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Sorry I inverted your original sentence when I read it. Lee
On Thu, Apr 24, 2014 at 8:44 AM, Antony Polukhin
You are right. Iterator holds a pointer to circular_buffer and accesses some of its internals.
However it is safe to use only iterators and overwrite independent parts of circular_buffer. All the operations with iterators do not modify internals of circular_bufer. So if you change cb.erase_first(n)/cb.resize(cb.capacity()) to iterator operations there must be no problems. Though I thinks that's what you have done in case of custom vector-based data structure.
And what iterator operations would that be? Before erase_first, I tried the iterator-based erase, but it actually had iterator invalidation properties that were of no use to me IIRC.
Thanks for the investigation!
Thanks for taking the time to read what I had to say. :) Cheers! Sebastian
participants (5)
-
Antony Polukhin
-
immanuel litzroth
-
Lee Clagett
-
Sebastian Gesemann
-
Steffen Heil (Mailinglisten)