[thread] shared_spinlock_t idea or faster shared_lock
Hi, This is just a refining of idea that was once implemented and worked well. The idea was to make shared spin lock. Here is some pseudo-code: class shared_spin_lock { atomic_max_uint_t state_; lock_shared() { do { // CAS loop auto snapshot = state_.load(); if (snapshot.has_writer_bit_set()) { reader_wait(); // waiting for writer to finish continue; } auto new_state = snapshot + 1; } while(!CAS(state, new_state)); } unlock_shared() { --state; } lock() { do { // CAS loop auto snapshot = state_.load(); if (snapshot.has_writer_bit_set()) { writer_wait(); // waiting for other writer to finish continue; } auto new_state = snapshot.set_writer_bit(); } while(!CAS(state, new_state)); // we have set the writer bit, now waiting for readers to finish while (state_.readers_count) ; // busy loop, can we do better? writer_started(); } unlock() { auto snapshot = state_.reset_writer_bit(); writer_finished(); ASSERT (snapshot.has_writer_bit_set()); } }; When all the *_wait() and writer_*() functions do nothing we get a shared spinlock. Now if we add helper mutex and make all the *_wait() functions to lock and unlock mutex; writer_start() - lock the mutex; writer finished() - unlock the mutex... we get a simplified shared_mutex that is faster (in theory) than current implementation. Is there an interest in such shared_spinlock_t? Any comments, suggestions? -- Best regards, Antony Polukhin
On May 2, 2014 1:37:46 AM EDT, Antony Polukhin
This is just a refining of idea that was once implemented and worked well. The idea was to make shared spin lock. Here is some pseudo-code:
class shared_spin_lock { atomic_max_uint_t state_;
lock_shared() { do { // CAS loop auto snapshot = state_.load(); if (snapshot.has_writer_bit_set()) { reader_wait(); // waiting for writer to finish continue; }
auto new_state = snapshot + 1; } while(!CAS(state, new_state)); }
unlock_shared() { --state; }
lock() { do { // CAS loop auto snapshot = state_.load(); if (snapshot.has_writer_bit_set()) { writer_wait(); // waiting for other writer to finish continue; }
auto new_state = snapshot.set_writer_bit(); } while(!CAS(state, new_state));
Why wouldn't you set the writer bit once you own the mutex? As it is, you release the mutex and the use CAS to try to set it, but no other writer can set it when you hold the mutex.
// we have set the writer bit, now waiting for readers to finish while (state_.readers_count) ; // busy loop, can we do better?
Run the busy loop a limited number of times before adding a yield or sleep instruction, then repeat.
writer_started(); }
unlock() { auto snapshot = state_.reset_writer_bit(); writer_finished(); ASSERT (snapshot.has_writer_bit_set()); }
};
When all the *_wait() and writer_*() functions do nothing we get a shared spinlock.
Now if we add helper mutex and make all the *_wait() functions to lock and unlock mutex; writer_start() - lock the mutex; writer finished() - unlock the mutex... we get a simplified shared_mutex that is faster (in theory) than current implementation.
I haven't seen the code to which you're comparing this implementation, so I have no idea whether it is faster. ___ Rob (Sent from my portable computation engine)
// we have set the writer bit, now waiting for readers to finish while (state_.readers_count) ; // busy loop, can we do better?
Run the busy loop a limited number of times before adding a yield or sleep instruction, then repeat.
on intel CPUs, one should use a PAUSE instruction as in _mm_pause() inside the busy loop ... tim
On Friday 02 May 2014 09:37:46 Antony Polukhin wrote:
Hi,
This is just a refining of idea that was once implemented and worked well. The idea was to make shared spin lock. Here is some pseudo-code:
class shared_spin_lock { atomic_max_uint_t state_;
lock_shared() { do { // CAS loop auto snapshot = state_.load(); if (snapshot.has_writer_bit_set()) { reader_wait(); // waiting for writer to finish continue; }
auto new_state = snapshot + 1; } while(!CAS(state, new_state)); }
unlock_shared() { --state; }
lock() { do { // CAS loop auto snapshot = state_.load(); if (snapshot.has_writer_bit_set()) { writer_wait(); // waiting for other writer to finish continue; }
auto new_state = snapshot.set_writer_bit(); } while(!CAS(state, new_state));
// we have set the writer bit, now waiting for readers to finish while (state_.readers_count) ; // busy loop, can we do better?
writer_started(); }
unlock() { auto snapshot = state_.reset_writer_bit(); writer_finished(); ASSERT (snapshot.has_writer_bit_set()); }
};
When all the *_wait() and writer_*() functions do nothing we get a shared spinlock.
Now if we add helper mutex and make all the *_wait() functions to lock and unlock mutex; writer_start() - lock the mutex; writer finished() - unlock the mutex... we get a simplified shared_mutex that is faster (in theory) than current implementation.
Is there an interest in such shared_spinlock_t? Any comments, suggestions?
There is shared_spin_mutex in Boost.Sync: boost/sync/mutexes/shared_spin_mutex.hpp
On Fri, May 2, 2014 at 6:37 AM, Antony Polukhin
Hi,
This is just a refining of idea that was once implemented and worked well. The idea was to make shared spin lock. Here is some pseudo-code:
When all the *_wait() and writer_*() functions do nothing we get a shared spinlock.
[snip]
Now if we add helper mutex and make all the *_wait() functions to lock and unlock mutex; writer_start() - lock the mutex; writer finished() - unlock the mutex... we get a simplified shared_mutex that is faster (in theory) than current implementation.
Is there an interest in such shared_spinlock_t? Any comments, suggestions?
Not sure, you generally want to use a spinlock when you expect contention to be very low. If you expect contention to be high enough that a r/w lock could be a win, you probably do not want to spin anyway. There is also the geneal issue of naive r/w locks with readers contending for an exclusive cacheline. -- gpd
2014-05-02 13:26 GMT+04:00 Giovanni Piero Deretta
On Fri, May 2, 2014 at 6:37 AM, Antony Polukhin
wrote: Hi,
This is just a refining of idea that was once implemented and worked well. The idea was to make shared spin lock. Here is some pseudo-code:
When all the *_wait() and writer_*() functions do nothing we get a shared spinlock.
[snip]
Now if we add helper mutex and make all the *_wait() functions to lock and unlock mutex; writer_start() - lock the mutex; writer finished() - unlock the mutex... we get a simplified shared_mutex that is faster (in theory) than current implementation.
Is there an interest in such shared_spinlock_t? Any comments, suggestions?
Not sure, you generally want to use a spinlock when you expect contention to be very low. If you expect contention to be high enough that a r/w lock could be a win, you probably do not want to spin anyway.
Read/write locks are not cheap (at least in Boost.Thread because of interruptions). For cases when there is is high read contention and critical section is *very* small shared spinlocks could show a better performance. -- Best regards, Antony Polukhin
On 2 May 2014 at 9:37, Antony Polukhin wrote:
This is just a refining of idea that was once implemented and worked well. The idea was to make shared spin lock. Here is some pseudo-code: [snip]
Is there an interest in such shared_spinlock_t? Any comments, suggestions?
Two points: 1. I have a TSX capable policy based compile time configurable spin lock in AFIO which might be useful. It really ought to enter Boost.Detail. It should work in shared memory. 2. Later today hopefully I'll ask this list for review of a new Boost.Thread object, the permit. The underlying POSIX permit written in C works in shared memory (the C++ object not so). Niall -- Currently unemployed and looking for work in Ireland. Work Portfolio: http://careers.stackoverflow.com/nialldouglas/
participants (6)
-
Andrey Semashev
-
Antony Polukhin
-
Giovanni Piero Deretta
-
Niall Douglas
-
Rob Stewart
-
Tim Blechmann