[signals] thread_safe_signals feature-complete

older
How to access bundled properties...

Frank Mori Hess

17 Feb 2007 17 Feb '07

4:09 p.m.

Hi all, I've finished thread_safe_signals. To get thread safety you have to include an additional header file thread_safe_signals/multi_threaded.hpp and set the last template parameter to boost::signals::multi_threaded. I've done a little benchmarking. With thread-safety off and connecting 10 slots, it appears to have about the same overhead as cvs boost.signals (which is much less than the latest release of boost.signals). When 1 slot is connected, it has more overhead, a little less than a factor of 2 when running a single empty slot. Turning thread-safety on increases overhead by about 50% for the 1 slot case and 100% for the 10 slot case. It is able to invoke slots concurrently though, which should be a win if the slot is non-trivial. -- Frank

Show replies by date

Timmo Stange

17 Feb 17 Feb

6:56 p.m.

Frank Mori Hess wrote:

...

Hi all,

I've finished thread_safe_signals. To get thread safety you have to include an additional header file thread_safe_signals/multi_threaded.hpp and set the last template parameter to boost::signals::multi_threaded.

Nice. I only had a few minutes yet to look at the newest version, so I might not have a good grasp on the bigger picture, but there are a few things I have difficulties to understand. This is from your signal operator()(...): shared_ptr<connection_list_type> localConnectionBodies; shared_ptr<const combiner_type> local_combiner; typename connection_list_type::iterator it; { typename mutex_type::scoped_lock listLock(_mutex); // only clean up if it is safe to do so if(_connectionBodies.use_count() == 1) nolock_cleanup_connections(false); localConnectionBodies = _connectionBodies; /* make a local copy of _combiner while holding mutex, so we are thread safe against the combiner getting modified by set_combiner()*/ local_combiner = _combiner; } [...followed by the invocation...] The localConnectionBodies is a shared pointer to the actual slot list and not a full copy, so I don't understand how this is going to be thread-safe against concurrent invocations. The mutex is released after this compound statement and the invocation traverses a list that might be modified by the "nolock_cleanup_connections(false);" in a concurrent call to the same function. Similar things appear elsewhere in the signal template. In the slot_call_iterator template I don't understand why you call lockNextCallable in dereference() and call it twice (instead of just once after incrementing the iterator) in increment(). That function may advance the iterator, which can do you no good in dereference() and is unnecessary there, as far as I can tell. It also doesn't seem like the slot_call_iterator skips disconnected slots. Regards Timmo Stange

Timmo Stange

7:21 p.m.

Timmo Stange wrote:

...

The localConnectionBodies is a shared pointer to the actual slot list and not a full copy, so I don't understand how this is going to be thread-safe against concurrent invocations. The mutex is released after this compound statement and the invocation traverses a list that might be modified by the "nolock_cleanup_connections(false);" in a concurrent call to the same function.

Sorry, I missed the use_count() check even though I quoted it ;). So concurrent invocation was a bad example, but what about other concurrent modifications of the slot container, like connect()? Another thing: The original signal is safe against deletion from a slot invocation context. That's why it uses pimpl-Idiom. Regards Timmo Stange

Frank Mori Hess

8:05 p.m.

On Saturday 17 February 2007 02:21 pm, Timmo Stange wrote:

...

Timmo Stange wrote:

...
The localConnectionBodies is a shared pointer to the actual slot list and not a full copy, so I don't understand how this is going to be thread-safe against concurrent invocations. The mutex is released after this compound statement and the invocation traverses a list that might be modified by the "nolock_cleanup_connections(false);" in a concurrent call to the same function.

Sorry, I missed the use_count() check even though I quoted it ;). So concurrent invocation was a bad example, but what about other concurrent modifications of the slot container, like connect()?

connect also checks use_count() after locking the slot list mutex. It calls nolock_force_unique_connection_list(), which creates a (expensive, I know) new copy of the slot list if it is already in use.

...

Another thing: The original signal is safe against deletion from a slot invocation context. That's why it uses pimpl-Idiom.

Hmm, I didn't consider that case. Since I've already got the pimpl there for signal tracking, I suppose I just need to create a local copy of the shared_ptr pimpl to prevent it from destructing in mid-invocation. -- Frank

Timmo Stange

8:34 p.m.

Frank Mori Hess wrote:

...

connect also checks use_count() after locking the slot list mutex. It calls nolock_force_unique_connection_list(), which creates a (expensive, I know) new copy of the slot list if it is already in use.

Hm, I found that function, but I don't see it getting called from connect().

...

...
Another thing: The original signal is safe against deletion from a slot invocation context. That's why it uses pimpl-Idiom.

Hmm, I didn't consider that case. Since I've already got the pimpl there for signal tracking, I suppose I just need to create a local copy of the shared_ptr pimpl to prevent it from destructing in mid-invocation.

Yes. Regards Timmo Stange

Frank Mori Hess

9:05 p.m.

On Saturday 17 February 2007 03:34 pm, Timmo Stange wrote:

...

Frank Mori Hess wrote:

...
connect also checks use_count() after locking the slot list mutex. It calls nolock_force_unique_connection_list(), which creates a (expensive, I know) new copy of the slot list if it is already in use.

Hm, I found that function, but I don't see it getting called from connect().

I neglected to mention create_new_connection() is in the call chain. -- Frank

Frank Mori Hess

10:09 p.m.

On Saturday 17 February 2007 03:34 pm, Timmo Stange wrote:

...

...
...
Another thing: The original signal is safe against deletion from a slot invocation context. That's why it uses pimpl-Idiom.

Hmm, I didn't consider that case. Since I've already got the pimpl there for signal tracking, I suppose I just need to create a local copy of the shared_ptr pimpl to prevent it from destructing in mid-invocation.

Yes.

Actually, after looking at it more closely, nothing needs to be done. Once the combiner is running, it doesn't cause any problems if the pimpl destructs. Everything the combiner needs is either a local variable or owned by a local shared_ptr in operator(). -- Frank

Frank Mori Hess

7:59 p.m.

On Saturday 17 February 2007 01:56 pm, Timmo Stange wrote:

...

In the slot_call_iterator template I don't understand why you call lockNextCallable in dereference() and call it twice (instead of just once after incrementing the iterator) in increment(). That function may advance the iterator, which can do you no good in dereference() and is unnecessary there, as far as I can tell.

It's to compensate for the fact that it is not called in the constructor, which I originally tried. If it's not called before the first dereference, then you might try to execute an invalid slot. If it's not called before the first increment, then you might not increment past the first slot. I got rid of the call in the constructor because it killed concurrent invocation, due to the iterators still in scope in operator() holding locks to the first slot. However, since I optimized it to only create temporary iterators when passing them to the combiner, I could probably just put the call back into the constructor.

...

It also doesn't seem like the slot_call_iterator skips disconnected slots.

It does, blocked returns true for disconnected slots. It passes all the test programs (at least it did the last time I tried them). I occasionally get infinite loops and segfaults in random_signal_system.cpp, but that is due to some bug in the test code (the same thing happens with boost.signals 1.32). -- Frank

Timmo Stange

8:26 p.m.

Frank Mori Hess wrote:

...

It's to compensate for the fact that it is not called in the constructor, which I originally tried. If it's not called before the first dereference, then you might try to execute an invalid slot. If it's not called before the first increment, then you might not increment past the first slot. I got rid of the call in the constructor because it killed concurrent invocation, due to the iterators still in scope in operator() holding locks to the first slot. However, since I optimized it to only create temporary iterators when passing them to the combiner, I could probably just put the call back into the constructor.

I'd say the combiner must always check for an empty range and the comparison of first and last will always ensure a valid position, or? Regards Timmo Stange

Frank Mori Hess

10:59 p.m.

On Saturday 17 February 2007 03:26 pm, Timmo Stange wrote:

...

Frank Mori Hess wrote:

...
It's to compensate for the fact that it is not called in the constructor, which I originally tried. If it's not called before the first dereference, then you might try to execute an invalid slot. If it's not called before the first increment, then you might not increment past the first slot. I got rid of the call in the constructor because it killed concurrent invocation, due to the iterators still in scope in operator() holding locks to the first slot. However, since I optimized it to only create temporary iterators when passing them to the combiner, I could probably just put the call back into the constructor.

I'd say the combiner must always check for an empty range and the comparison of first and last will always ensure a valid position, or?

Yes, you're probably right. I must have got it into my head that the combiner couldn't check for an empty range back when I discovered last_value had undefined behaviour if there were no slots connected. -- Frank

Gottlob Frege

18 Feb 18 Feb

5:08 a.m.

On 2/17/07, Frank Mori Hess <fmhess@speakeasy.net> wrote:

...

Hi all,

I've finished thread_safe_signals. To get thread safety you have to include an additional header file thread_safe_signals/multi_threaded.hpp and set the last template parameter to boost::signals::multi_threaded.

-- Frank

I've been really trying to find time to take a look at all this new thread-safe-slot stuff, as I have tackled this same problems a few times myself (not with boost signals/slots, but various other similar ones). Maybe I can just do a few short comments/questions: - do you hold any locks DURING the slot call? That's typically bad, since you have no idea what else is locked or what a slot might do, so deadlocks abound. - do you copy the slot list before running through the list? - can I connect/disconnect during my slot call (either disconnect myself, or connect/disconnect other calls? If so, does it happen in this signal, or the next one?) - as for the 'lock per slot' vs 'lock per signal': The solution I've come up with in the past is basically lock per slot, but more importantly *temporary* lock per slot. ie you create (or take from a pool) locks as needed, and store them in a "slot -> lock" map, so IF another thread also needs the lock for that slot, they check the map first, and obtain the same lock for that slot. Using this, if you are careful, in most cases, you don't actually end up creating the lock at all (as you have all the information necessary to see contention as it is happening, and create the locks as needed. ie you know which slot is being called, and which slot is concurrently being disconnected, and only lock if they are the same - which means don't lock on the slot call, but AFTER the call, see if another thread has come in and is waiting for you to finish. So actually it is not a lock at all, but more of an 'event' (or a 'signal' but that'd be confusing). Not sure if that makes much sense (got a bit rambly there), but it is the seed of an idea that does work if you are careful... Hopefully I'll find some time to look at the new code in a week or two, but right now is the infamous 'crunch time'... - Tony

Timmo Stange

7:34 a.m.

Gottlob Frege wrote:

...

- do you hold any locks DURING the slot call? That's typically bad, since you have no idea what else is locked or what a slot might do, so deadlocks abound.

Yes, that is a problem both implementations have right now.

...

- do you copy the slot list before running through the list?

Frank's approach has copy-on-write semantics.

...

- can I connect/disconnect during my slot call (either disconnect myself, or connect/disconnect other calls? If so, does it happen in this signal, or the next one?)

You can do that in both implementations. The behaviour regarding connect()s from a slot context differs. Disconnections always work for the current call, as long as the slot wasn't already passed.

...

- as for the 'lock per slot' vs 'lock per signal': The solution I've come up with in the past is basically lock per slot, but more importantly *temporary* lock per slot. ie you create (or take from a pool) locks as needed, and store them in a "slot -> lock" map, so IF another thread also needs the lock for that slot, they check the map first, and obtain the same lock for that slot. Using this, if you are careful, in most cases, you don't actually end up creating the lock at all (as you have all the information necessary to see contention as it is happening, and create the locks as needed. ie you know which slot is being called, and which slot is concurrently being disconnected, and only lock if they are the same - which means don't lock on the slot call, but AFTER the call, see if another thread has come in and is waiting for you to finish. So actually it is not a lock at all, but more of an 'event' (or a 'signal' but that'd be confusing).

I'm not sure I completely understood what you're suggesting here. Trying to avoid the use of the synchronization primitives we have (few as they are with Boost.Threads) will most likely lead to trouble. The lock map you're proposing would need to be synched with a mutex, too, so I don't see the immediate gain. I currently only use a per-signal mutex and it should be possible to release that for the duration of the slot call to avoid the deadlocks you mentioned further above. That doesn't give me the freedom of concurrent slot invocation, but seeing how many problems that entails, I'm not sure it is worth the trouble anymore. Regards Timmo Stang

Frank Mori Hess

5:35 p.m.

On Sunday 18 February 2007 12:08 am, Gottlob Frege wrote:

...

- do you hold any locks DURING the slot call? That's typically bad, since you have no idea what else is locked or what a slot might do, so deadlocks abound.

Well, neither implementation will deadlock, except for lock-ordering problems between external mutexes and the ones inside the signals implementation. I've got a list lock and a slot lock, and only the slot lock is held while the slot is executing. I took care that it is safe to do a blocking lock on the list lock while a slot lock is already held, and the slot lock itself is recursive so I don't see any deadlocks.

...

- as for the 'lock per slot' vs 'lock per signal': The solution I've come up with in the past is basically lock per slot, but more importantly *temporary* lock per slot. ie you create (or take from a pool) locks as needed, and store them in a "slot -> lock" map, so IF another thread also needs the lock for that slot, they check the map first, and obtain the same lock for that slot. Using this, if you are careful, in most cases, you don't actually end up creating the lock at all (as you have all the information necessary to see contention as it is happening, and create the locks as needed. ie you know which slot is being called, and which slot is concurrently being disconnected, and only lock if they are the same - which means don't lock on the slot call, but AFTER the call, see if another thread has come in and is waiting for you to finish. So actually it is not a lock at all, but more of an 'event' (or a 'signal' but that'd be confusing).

Not sure if that makes much sense (got a bit rambly there), but it is the seed of an idea that does work if you are careful...

That's interesting, I hadn't considered dropping the slot lock before running the slot. It would eliminate the possibility of lock-ordering problems with mutexs in the slot/client code. The slot mutex is really just to protect the disconnect and block operations, so by the time the slot is about to be run all we really need to worry about is that the slot function doesn't get deleted while it is executing. I think I could just drop the slot mutex in the slot iterator dereference operation, after either copying the slot function or using some shared_ptr magic to keep it alive, and all would be well. -- Frank

Frank Mori Hess

5:45 p.m.

On Sunday 18 February 2007 12:35 pm, Frank Mori Hess wrote:

...

That's interesting, I hadn't considered dropping the slot lock before running the slot. It would eliminate the possibility of lock-ordering problems with mutexs in the slot/client code. The slot mutex is really just to protect the disconnect and block operations, so by the time the slot is about to be run all we really need to worry about is that the slot function doesn't get deleted while it is executing. I think I could just drop the slot mutex in the slot iterator dereference operation, after either copying the slot function or using some shared_ptr magic to keep it alive, and all would be well.

It turns out shared_ptr magic was already insuring the slot function would not be deleted, so it was just a 1 line change to drop the slot lock before running the slot (in sandbox cvs now). It is now possible for the same slot to be simultaneously executing in multiple threads, but I don't think that violates our design requirements does it? -- Frank

Timmo Stange

6:08 p.m.

Frank Mori Hess wrote:

...

On Sunday 18 February 2007 12:08 am, Gottlob Frege wrote:

...
- do you hold any locks DURING the slot call? That's typically bad, since you have no idea what else is locked or what a slot might do, so deadlocks abound.

Well, neither implementation will deadlock, except for lock-ordering problems between external mutexes and the ones inside the signals implementation. I've got a list lock and a slot lock, and only the slot lock is held while the slot is executing. I took care that it is safe to do a blocking lock on the list lock while a slot lock is already held, and the slot lock itself is recursive so I don't see any deadlocks.

Consider two slots which will mutually disconnect the other in a situation of concurrent signal invocation. With each thread holding the lock of either slot (because of the invocation), neither would be able to disconnect the other - a deadlock. My implementation can similarly deadlock when two signals are involved. Regards Timmo Stange

Frank Mori Hess

6:27 p.m.

On Sunday 18 February 2007 01:08 pm, Timmo Stange wrote:

...

Frank Mori Hess wrote:

...
On Sunday 18 February 2007 12:08 am, Gottlob Frege wrote:

...
- do you hold any locks DURING the slot call? That's typically bad, since you have no idea what else is locked or what a slot might do, so deadlocks abound.

Well, neither implementation will deadlock, except for lock-ordering problems between external mutexes and the ones inside the signals implementation. I've got a list lock and a slot lock, and only the slot lock is held while the slot is executing. I took care that it is safe to do a blocking lock on the list lock while a slot lock is already held, and the slot lock itself is recursive so I don't see any deadlocks.

Consider two slots which will mutually disconnect the other in a situation of concurrent signal invocation. With each thread holding the lock of either slot (because of the invocation), neither would be able to disconnect the other - a deadlock.

I stand corrected, my deadlock detector must need tune-up. Dropping the slot mutex before executing has made this deadlock go away too though. -- Frank

Timmo Stange

7:16 p.m.

Frank Mori Hess wrote:

...

I stand corrected, my deadlock detector must need tune-up. Dropping the slot mutex before executing has made this deadlock go away too though.

Hm, since you do the locking in the slot_call_iterator, isn't it the combiner which actually controls locks on the per-slot mutexes? That may be problematic with respect to iterator copies. Regards Timmo Stange

Frank Mori Hess

10:50 p.m.

On Sunday 18 February 2007 02:16 pm, Timmo Stange wrote:

...

Frank Mori Hess wrote:

...
I stand corrected, my deadlock detector must need tune-up. Dropping the slot mutex before executing has made this deadlock go away too though.

Hm, since you do the locking in the slot_call_iterator, isn't it the combiner which actually controls locks on the per-slot mutexes? That may be problematic with respect to iterator copies.

I've just checked in some clean-ups, so the iterator doesn't keep a scoped pointer to a lock anymore. I changed lockNextCallable() so it uses the locking form of blocked() and the iterator doesn't need any explicit locking at all (correct me if I'm wrong). Also, it allowed me to dump that memory pool stuff that was trying to optimize dynamic allocation of scoped locks. -- Frank

Timmo Stange

19 Feb 19 Feb

6:42 a.m.

Frank Mori Hess wrote:

...

I've just checked in some clean-ups, so the iterator doesn't keep a scoped pointer to a lock anymore. I changed lockNextCallable() so it uses the locking form of blocked() and the iterator doesn't need any explicit locking at all (correct me if I'm wrong). Also, it allowed me to dump that memory pool stuff that was trying to optimize dynamic allocation of scoped locks.

Yes, the code looks much cleaner that way and the iterator memory footprint is back within acceptable limits ;). What I don't like about the release of all mutexes during the call is that we lose the strong semantics of disconnect() and block(). We cannot guarantee anymore, that a slot won't be called after disconnect() returned. That is a form of thread safety only the library can provide. Regards Timmo Stange

Frank Mori Hess

7:35 a.m.

On Monday 19 February 2007 01:42 am, Timmo Stange wrote:

...

What I don't like about the release of all mutexes during the call is that we lose the strong semantics of disconnect() and block(). We cannot guarantee anymore, that a slot won't be called after disconnect() returned. That is a form of thread safety only the library can provide.

We could have the connection body dispense some kind of reference-counted token. The slot iterator would grab a copy of the token when it has iterated onto a slot that it might execute, and destroy it when it iterates past the slot or destructs. disconnect()/block() could check the reference count to decide when it is safe to return, either spinning or waiting on a condition variable in the meantime. Since we only want the disconnect() to wait if the slot is running in a different thread, the token would have to maintain both a global and a thread-specific reference count. disconnect()/block() could return when they are equal. -- Frank

Timmo Stange

7:55 a.m.

Frank Mori Hess wrote:

...

...
What I don't like about the release of all mutexes during the call is that we lose the strong semantics of disconnect() and block(). We cannot guarantee anymore, that a slot won't be called after disconnect() returned. That is a form of thread safety only the library can provide.

We could have the connection body dispense some kind of reference-counted token. The slot iterator would grab a copy of the token when it has iterated onto a slot that it might execute, and destroy it when it iterates past the slot or destructs. disconnect()/block() could check the reference count to decide when it is safe to return, either spinning or waiting on a condition variable in the meantime. Since we only want the disconnect() to wait if the slot is running in a different thread, the token would have to maintain both a global and a thread-specific reference count. disconnect()/block() could return when they are equal.

If you don't let disconnect() return until all calls to the slot are through, you're back to square one with regard to potential deadlocks. Keep in mind that disconnect() may be called (in practice often is) from a slot call context. I don't think there is a clean solution to this which covers all problems. It would be possible to have different threading policies for either behavior (one being dead-lock-safe, the other with strong disconnect() guarantees), but that defeats the whole abstraction idea. Alternatively, disconnect() could return a bool if the disconnected slot is currently running or throw an exception. That could be optional through an argument (disconnect(true)) or a different function (checked_disconnect()). A connection::wait() function could wait for all pending calls to finish, but couldn't be safely called from a signal invocation context. Regards Timmo Stange

Frank Mori Hess

8:01 a.m.

On Monday 19 February 2007 02:55 am, Timmo Stange wrote:

...

...
We could have the connection body dispense some kind of reference-counted token. The slot iterator would grab a copy of the token when it has iterated onto a slot that it might execute, and destroy it when it iterates past the slot or destructs. disconnect()/block() could check the reference count to decide when it is safe to return, either spinning or waiting on a condition variable in the meantime. Since we only want the disconnect() to wait if the slot is running in a different thread, the token would have to maintain both a global and a thread-specific reference count. disconnect()/block() could return when they are equal.

If you don't let disconnect() return until all calls to the slot are through, you're back to square one with regard to potential deadlocks. Keep in mind that disconnect() may be called (in practice often is) from a slot call context.

Yes, I think what I outlined is essentially amounts to a recursive read-write mutex.

...

I don't think there is a clean solution to this which covers all problems. It would be possible to have different threading policies for either behavior (one being dead-lock-safe, the other with strong disconnect() guarantees), but that defeats the whole abstraction idea.

Yes, I think the strong disconnect() guarantee will also guarantee the deadlock you outlined earlier with two slots attempting to disconnect each other will always be possible.

...

Alternatively, disconnect() could return a bool if the disconnected slot is currently running or throw an exception. That could be optional through an argument (disconnect(true)) or a different function (checked_disconnect()). A connection::wait() function could wait for all pending calls to finish, but couldn't be safely called from a signal invocation context.

I like connection::wait() better. a return value from disconnect still wouldn't be able to tell you when the slot is really gone, only that it wasn't gone when disconnect returned. -- Frank

Gottlob Frege

7:37 p.m.

On 2/19/07, Frank Mori Hess <fmhess@speakeasy.net> wrote:

...

Yes, I think the strong disconnect() guarantee will also guarantee the deadlock you outlined earlier with two slots attempting to disconnect each other will always be possible.

I think the strong guarantee is much more important. If I say disconnect() I really don't expect calls after that. The deadlock only happens when: 1. signals are being processed in multiple threads and 2. slots are disconnecting each other 1 is not what I was typically thinking wrt threaded signals - I was more thinking about connect/disconnect from multiple threads, but the signal would only ever happen on the same thread. Maybe that's too narrow of a view on my part. 2. slots disconnecting each other is really rare/stupid So I say you make the *default* disconnect wait, and supply an alternative for the rare cases. P.S. what about connect/disconnect 'pairing'? ie is there an assumed (or explicit) contract that every call to connect is matched by a disconnect? If I have a slot that is in a signal, and that SAME signal is triggered in, say, 3 threads, the slot can't just call disconnect() (ie the same code in each thread) - it has to somehow carefully keep track that it is calling disconnect() in 1 thread, and the other 2 threads need to skip the disconnect() call. Or does that matching guarantee not exist? - Tony

Timmo Stange

9:02 p.m.

Gottlob Frege wrote:

...

I think the strong guarantee is much more important. If I say disconnect() I really don't expect calls after that. The deadlock only happens when:

1. signals are being processed in multiple threads and 2. slots are disconnecting each other

1 is not what I was typically thinking wrt threaded signals - I was more thinking about connect/disconnect from multiple threads, but the signal would only ever happen on the same thread. Maybe that's too narrow of a view on my part. 2. slots disconnecting each other is really rare/stupid

So I say you make the *default* disconnect wait, and supply an alternative for the rare cases.

I think avoiding potential deadlocks is equally important to the strong disconnection guarantee. The most prominent aspect of Signals is the decoupling between caller and callee it provides. The signal knows nothing about the code it calls and - through generalized callbacks - the called code doesn't even need to know it is called by a signal. We cannot expect from a user that they will carefully steer around potential deadlock problems as that would defeat most of the advantages of using the system in the first place. It's quite possible a user has no control at all over the slot code, for example. On a side note: I don't know if you've followed the tracking functionality changes we've discussed. In a nutshell, we agreed that shared_ptr/weak_ptr is the most portable, useful solution in a multi-threaded environment. Since this kind of tracking ensures the lifetime of the tracked objects during a slot call, a weak disconnect would not keep users from safely releasing resources associated with a slot - as long as they're tracked.

...

P.S. what about connect/disconnect 'pairing'? ie is there an assumed (or explicit) contract that every call to connect is matched by a disconnect? If I have a slot that is in a signal, and that SAME signal is triggered in, say, 3 threads, the slot can't just call disconnect() (ie the same code in each thread) - it has to somehow carefully keep track that it is calling disconnect() in 1 thread, and the other 2 threads need to skip the disconnect() call.

Or does that matching guarantee not exist?

I don't think I understood this. You can only connect a slot once - if you connect it twice it will be a separate connection and the slot will be called twice. You can only disconnect it once accordingly, although trying to disconnect it several times won't do any harm. Regards Timmo Stange

Frank Mori Hess

9:58 p.m.

On Monday 19 February 2007 04:02 pm, Timmo Stange wrote:

...

Gottlob Frege wrote:

...
I think the strong guarantee is much more important. If I say disconnect() I really don't expect calls after that. The deadlock only happens when:

1. signals are being processed in multiple threads and 2. slots are disconnecting each other

1 is not what I was typically thinking wrt threaded signals - I was more thinking about connect/disconnect from multiple threads, but the signal would only ever happen on the same thread. Maybe that's too narrow of a view on my part. 2. slots disconnecting each other is really rare/stupid

So I say you make the *default* disconnect wait, and supply an alternative for the rare cases.

I'm finding myself doubting the wisdom of implementing support for a strong disconnect guarantee. If you do a disconnect()/block() that waits until the slot is finished executing, then you can deadlock on any resource that the slot attempts to acquire which is already held by the thread calling disconnect(). It doesn't necessarily require the signal to be invoked in multiple threads, or for two slots to attempt to simultaneously disconnect each other. If someone really cares about their slot running after disconnect, they could always use their own mutex to lock the disconnect call, and also lock the mutex in the slot and check connected() before proceeding. Is it really so bad to only guarantee that any signal invocations started after disconnect()/block() returns will not execute the slot? -- Frank

Gottlob Frege

10:48 p.m.

On 2/19/07, Timmo Stange <ts@sepulchrum.com> wrote:

...

Gottlob Frege wrote:

On a side note: I don't know if you've followed the tracking functionality changes we've discussed. In a nutshell, we agreed that shared_ptr/weak_ptr is the most portable, useful solution in a multi-threaded environment. Since this kind of tracking ensures the lifetime of the tracked objects during a slot call, a weak disconnect would not keep users from safely releasing resources associated with a slot - as long as they're tracked.

I'm trying to keep up but it's not easy!

...

...
P.S. what about connect/disconnect 'pairing'? ie is there an assumed (or explicit) contract that every call to connect is matched by a disconnect? If I have a slot that is in a signal, and that SAME signal is triggered in, say, 3 threads, the slot can't just call disconnect() (ie the same code in each thread) - it has to somehow carefully keep track that it is calling disconnect() in 1 thread, and the other 2 threads need to skip the disconnect() call.

Or does that matching guarantee not exist?

I don't think I understood this. You can only connect a slot once - if you connect it twice it will be a separate connection and the slot will be called twice. You can only disconnect it once accordingly, although trying to disconnect it several times won't do any harm.

Ignore me. I was translating in my head between different implementations of the same broadcaster/observer patterns. In the one I was last using, the 'subscribe' and 'unsubscribe' in matched pairs.

...

Regards

Timmo Stange

_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Peter Dimov

10:35 p.m.

Gottlob Frege wrote:

...

On 2/19/07, Frank Mori Hess <fmhess@speakeasy.net> wrote:

...
Yes, I think the strong disconnect() guarantee will also guarantee the deadlock you outlined earlier with two slots attempting to disconnect each other will always be possible.

I think the strong guarantee is much more important. If I say disconnect() I really don't expect calls after that.

Does this imply that concurrent invocations are implicitly serialized at slot level? This guarantee makes sense for a use pattern where one has X * px = new X; and bind(&X::f, px) as a slot. Now thread 1 calls the signal and thread 2 disconnects the slot and immediately issues a delete px after that. I wouldn't have written the code in such a way, though. A shared_ptr<X> as px would work regardless of whether disconnect() guaranteed no calls after return.

Timmo Stange

10:12 p.m.

Frank Mori Hess wrote:

...

...
Alternatively, disconnect() could return a bool if the disconnected slot is currently running or throw an exception. That could be optional through an argument (disconnect(true)) or a different function (checked_disconnect()). A connection::wait() function could wait for all pending calls to finish, but couldn't be safely called from a signal invocation context.

I like connection::wait() better. a return value from disconnect still wouldn't be able to tell you when the slot is really gone, only that it wasn't gone when disconnect returned.

How about an optional callback parameter for disconnect(), which usually will be called immediately after the successful disconnection or, in the rare case of the slot being run on another thread at the time of the disconnection, after it returned (naturally from the other thread's context then)? Not specifying this parameter would imply that the caller has taken the necessary precautions to avoid all possible problems (either by externally synchronizing signal calls or by not relying on strong disconnect semantics). The reason for this idea is simplicity of use. Keeping a close watch on resources you share between threads is everyday work, so in many cases special handling for the concurrent disconnection/invocation will not be necessary. In all other cases (when the user has no access to the called code, for example), the tools we provide should be tailored to the problem: this concurrency situation is rather unlikely and the common case should result in synchronous behavior. My connect::wait() idea would often force the user to run a separate maintenance thread for resource cleanup, in which he performs those waits - a high cost (design- and performace-wise) for something that is unlikely to happen. Regards Timmo Stange

Gottlob Frege

11:01 p.m.

On 2/19/07, Timmo Stange <ts@sepulchrum.com> wrote:

...

Frank Mori Hess wrote:

How about an optional callback parameter for disconnect(), which usually will be called immediately after the successful disconnection or, in the rare case of the slot being run on another thread at the time of the disconnection, after it returned (naturally from the other thread's context then)? Not specifying this parameter would imply that the caller has taken the necessary precautions to avoid all possible problems (either by externally synchronizing signal calls or by not relying on strong disconnect semantics).

The reason for this idea is simplicity of use. Keeping a close watch on resources you share between threads is everyday work, so in many cases special handling for the concurrent disconnection/invocation will not be necessary. In all other cases (when the user has no access to the called code, for example), the tools we provide should be tailored to the problem: this concurrency situation is rather unlikely and the common case should result in synchronous behavior. My connect::wait() idea would often force the user to run a separate maintenance thread for resource cleanup, in which he performs those waits - a high cost (design- and performace-wise) for something that is unlikely to happen.

This is starting to sound promising. I could also imagine that disconnect returns some kind of event that you can choose to wait on or not. Let me back up a bit: - one of the original reasons (in my head at least) for blocking on disconnect() is so that you can disconnect() in your destructor (and thus avoid having a dead object being called). I guess with weak_ptr, this is no longer a problem.(?) - If I DO want to wait, then I need to - create an 'event', (and store it somewhere - probably in my class? or in a bind passed to disconnect?) - call disconnect, passing in a callback that will set the event - wait on the event some_event_thing event(create_blocked); disconnect(bind(unblock_function, event)); e.wait(); It looks like a ALWAYS need to create the event. If we instead returned a event (possibly an empty one) be more efficient if you can tell (internally) that there is no contention? ie: signal::event e = disconnect(); e.wait(); // (may be an 'empty' event, so no real wait) ie, the signal code, instead of calling the disconnect callback, triggers the event. Not as generic as a callback, but maybe more efficient?

Peter Dimov

11:11 p.m.

Gottlob Frege wrote:

...

- one of the original reasons (in my head at least) for blocking on disconnect() is so that you can disconnect() in your destructor (and thus avoid having a dead object being called).

But then you need to deal with the possibility that a call can already be in progress at the time your destructor is invoked. (If you can guarantee that this cannot happen, it seems to me that you can also guarantee that a call isn't started after the disconnect.) So let's assume that you lock the object mutex. There's still the nasty scenario where: ~X locks the object mutex X::f is called by the signal and blocks on the object mutex ~X disconnects and destroys *this, including the object mutex X::f crashes and burns I can't think of a way to avoid the above that doesn't also solve the "call after disconnect" problem.

Peter Dimov

20 Feb 20 Feb

12:43 a.m.

Peter Dimov wrote:

...

Gottlob Frege wrote:

...
- one of the original reasons (in my head at least) for blocking on disconnect() is so that you can disconnect() in your destructor (and thus avoid having a dead object being called).

But then you need to deal with the possibility that a call can already be in progress at the time your destructor is invoked. (If you can guarantee that this cannot happen, it seems to me that you can also guarantee that a call isn't started after the disconnect.) So let's assume that you lock the object mutex. There's still the nasty scenario where:

~X locks the object mutex X::f is called by the signal and blocks on the object mutex ~X disconnects and destroys *this, including the object mutex X::f crashes and burns

Actually, if disconnect waits for X::f to return, this will deadlock, not crash.

Timmo Stange

12:47 a.m.

Peter Dimov wrote:

...

But then you need to deal with the possibility that a call can already be in progress at the time your destructor is invoked. (If you can guarantee that this cannot happen, it seems to me that you can also guarantee that a call isn't started after the disconnect.) So let's assume that you lock the object mutex. There's still the nasty scenario where:

~X locks the object mutex X::f is called by the signal and blocks on the object mutex ~X disconnects and destroys *this, including the object mutex X::f crashes and burns

I can't think of a way to avoid the above that doesn't also solve the "call after disconnect" problem.

Let me give you a short summary of how Frank and I plan to provide tracking for the thread-safe Signals implementation: The client has to maintain ownership of objects to track through shared_ptr. It informs the signal that it wants those objects to be tracked by wrapping that shared_ptr in a signals::tracked object (with a signals::track() helper function for bind(...) et al.). The signal holds the reference to that object as a weak_ptr and will not call the associated slot if it is expired. Otherwise it will hold a local shared_ptr to that object during the slot call, ensuring its lifetime for the duration. The scenario you're discussing is much better handled using that tracking functionality, especially as it makes manual disconnection completely unnecessary. I think it only needs proper documentation to keep the user from trying stunts like the above - provided they read it... Regards Timmo Stange

Peter Dimov

2:34 a.m.

Timmo Stange wrote:

...

Peter Dimov wrote:

...
But then you need to deal with the possibility that a call can already be in progress at the time your destructor is invoked. (If you can guarantee that this cannot happen, it seems to me that you can also guarantee that a call isn't started after the disconnect.) So let's assume that you lock the object mutex. There's still the nasty scenario where:

~X locks the object mutex X::f is called by the signal and blocks on the object mutex ~X disconnects and destroys *this, including the object mutex X::f crashes and burns

I can't think of a way to avoid the above that doesn't also solve the "call after disconnect" problem.

Let me give you a short summary of how Frank and I plan to provide tracking for the thread-safe Signals implementation:

The client has to maintain ownership of objects to track through shared_ptr. It informs the signal that it wants those objects to be tracked by wrapping that shared_ptr in a signals::tracked object (with a signals::track() helper function for bind(...) et al.). The signal holds the reference to that object as a weak_ptr and will not call the associated slot if it is expired. Otherwise it will hold a local shared_ptr to that object during the slot call, ensuring its lifetime for the duration.

I know. The question was whether the "no calls after disconnect" guarantee is valuable, and disconnecting in a destructor was given as a motivating example. I was just explaining that the guarantee doesn't buy much since disconnecting in a destructor is still unsafe.

Gottlob Frege

2:51 a.m.

On 2/19/07, Peter Dimov <pdimov@mmltd.net> wrote:

...

I know. The question was whether the "no calls after disconnect" guarantee is valuable, and disconnecting in a destructor was given as a motivating example. I was just explaining that the guarantee doesn't buy much since disconnecting in a destructor is still unsafe.

Is it? If we are using weak_ptrs? Tony

Gottlob Frege

1:03 a.m.

On 2/19/07, Peter Dimov <pdimov@mmltd.net> wrote:

...

Gottlob Frege wrote:

...
- one of the original reasons (in my head at least) for blocking on disconnect() is so that you can disconnect() in your destructor (and thus avoid having a dead object being called).

But then you need to deal with the possibility that a call can already be in progress at the time your destructor is invoked. (If you can guarantee that this cannot happen, it seems to me that you can also guarantee that a call isn't started after the disconnect.) So let's assume that you lock the object mutex. There's still the nasty scenario where:

~X locks the object mutex X::f is called by the signal and blocks on the object mutex ~X disconnects and destroys *this, including the object mutex X::f crashes and burns

I can't think of a way to avoid the above that doesn't also solve the "call after disconnect" problem.

using weak_ptrs (I think you've heard of them? :-). This was one of the ideas that really started the whole ball rolling. Tony

Peter Dimov

2:29 a.m.

Gottlob Frege wrote:

...

On 2/19/07, Peter Dimov <pdimov@mmltd.net> wrote:

...
Gottlob Frege wrote:

...
- one of the original reasons (in my head at least) for blocking on disconnect() is so that you can disconnect() in your destructor (and thus avoid having a dead object being called).

But then you need to deal with the possibility that a call can already be in progress at the time your destructor is invoked. (If you can guarantee that this cannot happen, it seems to me that you can also guarantee that a call isn't started after the disconnect.) So let's assume that you lock the object mutex. There's still the nasty scenario where:

~X locks the object mutex X::f is called by the signal and blocks on the object mutex ~X disconnects and destroys *this, including the object mutex X::f crashes and burns

I can't think of a way to avoid the above that doesn't also solve the "call after disconnect" problem.

using weak_ptrs (I think you've heard of them? :-).

Rings a bell. But using weak_ptr's also makes the "no calls after disconnect" guarantee unnecessary; the signal will take proper care of your object's lifetime by keeping a shared_ptr to it during the call. So why provide it? Seems to me that you can't (easily) take advantage of it safely.

Gottlob Frege

2:36 a.m.

On 2/19/07, Peter Dimov <pdimov@mmltd.net> wrote:

...

Gottlob Frege wrote:

...
On 2/19/07, Peter Dimov <pdimov@mmltd.net> wrote:

...
Gottlob Frege wrote:

...
- one of the original reasons (in my head at least) for blocking on disconnect() is so that you can disconnect() in your destructor (and thus avoid having a dead object being called).

But then you need to deal with the possibility that a call can already be in progress at the time your destructor is invoked. (If you can guarantee that this cannot happen, it seems to me that you can also guarantee that a call isn't started after the disconnect.) So let's assume that you lock the object mutex. There's still the nasty scenario where:

~X locks the object mutex X::f is called by the signal and blocks on the object mutex ~X disconnects and destroys *this, including the object mutex X::f crashes and burns

I can't think of a way to avoid the above that doesn't also solve the "call after disconnect" problem.

using weak_ptrs (I think you've heard of them? :-).

Rings a bell. But using weak_ptr's also makes the "no calls after disconnect" guarantee unnecessary; the signal will take proper care of your object's lifetime by keeping a shared_ptr to it during the call. So why provide it? Seems to me that you can't (easily) take advantage of it safely.

That's why I brought up the 'original reason' - it is no longer necessary. But there are still other times that disconnect is called, and I was wondering how important the guarantee is for them. I guess I'm not being very clear. 1. review original reasons - do they still apply? 2. consider other existing reasons Tony

Timmo Stange

12:33 a.m.

Gottlob Frege wrote:

...

This is starting to sound promising. I could also imagine that disconnect returns some kind of event that you can choose to wait on or not.

[...]

...

ie, the signal code, instead of calling the disconnect callback, triggers the event.

Not as generic as a callback, but maybe more efficient?

Leaving aside the lack of a simple event primitive in Boost.Threads, we could easily provide a predefined callback functor in Signals that implements this, but I have the feeling we're running in circles here, because that is exactly what we don't want the client code to do: If two slots use that approach with disconnect, the outcome may be a deadlock - not on the slot mutexes this time, but on the "slot return events" ;). Regards Timmo Stange

Gottlob Frege

12:58 a.m.

On 2/19/07, Timmo Stange <ts@sepulchrum.com> wrote:

...

Gottlob Frege wrote:

...
This is starting to sound promising. I could also imagine that disconnect returns some kind of event that you can choose to wait on or not.

[...]

...
ie, the signal code, instead of calling the disconnect callback, triggers the event.

Not as generic as a callback, but maybe more efficient?

Leaving aside the lack of a simple event primitive in Boost.Threads, we could easily provide a predefined callback functor in Signals that implements this, but I have the feeling we're running in circles here, because that is exactly what we don't want the client code to do: If two slots use that approach with disconnect, the outcome may be a deadlock - not on the slot mutexes this time, but on the "slot return events" ;).

I understand that, but I guess I'm still clinging to the idea that for most uses, you do want to wait, and won't be in a situation where you could deadlock. We are just moving along the line of how hard/easy it is to use and how hard/easy it is to deadlock. The biggest step is just moving the locking out of the signal and over to the user code so that the user has the option to handle things how they want. From there, it is what do you want to make easy/hard, encourage/discourage. Tony P.S. the 'event' primitive would be pretty easy to implement. But yeah, it doesn't really exist currently.

6723

Age (days ago)

6726

Last active (days ago)

List overview

Download

1 comments

2 participants

participants (2)

Frank Mori Hess
Gottlob Frege
Peter Dimov
Timmo Stange