[StateChart] event processing order in async machine
Hello,
I've got an async. state machine (1.44, MSVC10), and sometimes I
observe a very strange behavior: events being processed not in the
order they are queued. I spent some time trying to investigate this
issue, but due to the asynchronous nature of the processing, it's
quite not trivial. So, before I start sinking too deep, I'd like to
ensure I don't miss something trivial.
Unfortunately, I don't have at the moment any self-contained code, but
a very simplified model of my FSM is as follows. Assume there are 2
events: Ev1, Ev2. The both are queued to the FMS from the same thread,
always in the same order: Ev1, Ev2. The FSM processes them in another
thread.
The FMS has some "entry" chain of states that defer the both events:
S1 /[defer(Ev1, Ev2)] --> S2 /[defer(Ev1, Ev2)]
and finally 2 orthogonal sub-states, one of them ignores the both
events, and the other processes the both:
...S2 --> S3
events: Ev1, Ev2. The both are queued to the FMS from the same thread, always in the same order: Ev1, Ev2. The FSM processes them in another thread. The FMS has some "entry" chain of states that defer the both events:
S1 /[defer(Ev1, Ev2)] --> S2 /[defer(Ev1, Ev2)]
and finally 2 orthogonal sub-states, one of them ignores the both events, and the other processes the both:
...S2 --> S3
, S3_1 /[custom_reaction(Ev1), in_state_reaction(Ev2)] What I see is that *sometimes* the reaction for Ev2 is invoked *before* the reaction for Ev1. This "sometimes" really puzzles me: I'm absolutely sure that the order of queuing is always the same - so where can be a "race"?
I think I found out what happened there. If Ev1 and Ev2 are queued one after another quickly enough, they are deferred together through all the chain of states - up to S3_1. BUT if Ev1 already made its way towards the finish when Ev2 is queued, the latter is processed (by deferal or reaction) *first*. And only then the previously deferred Ev1 is processed. Is my guess true? If yes, is there any way to preserve the order of events enqueued to the async machine? Thanks!
I finally figured out the reason of the above effect.
Assume Ev1 and Ev2 are both deferred in the current state. Ev1 and Ev2
(in this order) are currently in state_machine::eventQueue_, and
state_machine::process_queued_events() is invoked.
If between Ev1 and Ev2 there's an event, which causes transition from
the current state, then Ev1 will be processed (deferred), but Ev2 will
have no chance to get processed, and thus it will remain in
state_machine::eventQueue_.
Now, when the current state is getting destroyed due to the
transition, release_events() is invoked, and the following line puts
Ev1 (which is deferred) *after* Ev2 (which is still is the
eventQueue_):
eventQueue_.splice( eventQueue_.end(), pFound->second );
The mechanism is clear now, but the question is whether it's a bug or
a feature :).
I attach a simple reproducing program (MSVC10, boost 1.44).
Thanks.
#include <iostream>
#include
reactions; s1() { std::cout << "s1" << std::endl; } };
struct s2 : simple_state
reactions; s2() { std::cout << "s2" << std::endl; } };
struct s3 : simple_state
reactions; s3() { std::cout << "s3" << std::endl; } };
struct s4_1 : simple_state
Hi Igor As far as I can tell, the phenomenon you're observing would maifest itself in the same way in synchronous machines as it does in your async machine. What follows therefore only considers the properties that are common to both types of machines. The mechanisms involved here are the following: 1. Each state machine has a general event queue. 2. Logically, there is a *per-state* deferred event queue, which stores incoming events that are deferred while the machine resides in a given state. Whenever a state is left, all deferred events for that state are moved into the general event queue. 3. As the last step of state_machine::process_event, all events in the general event queue are processed. IIUC, what you're observing can be explained with the 3 points above, right? In your example, When s1 is left, instances of ev3to4_1 and ev3to4_2 are moved from the deferral queue to the general event queue, although the next state does nothing else but defer them also. IMO, the root of the problem lies here. A better solution would be to put both s1 and s2 into an outer state (e.g. s) and have that defer ev3to4_1 and ev3to4_2. s1 would then only defer ev2to3 and transition to s2. Similarly, s2 would then only transition to s3. This does away with some duplication in your code (the deferral of ev3to4_1 and ev3to4_2 in both s1 & s2) and would also be slightly more efficient, as the deferred events are only put into the general event queue exactly once (when s is left). HTH & Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.
Hi Andreas, Thanks for your response!
As far as I can tell, the phenomenon you're observing would maifest itself in the same way in synchronous machines as it does in your async machine. What follows therefore only considers the properties that are common to both types of machines.
Yes, of course, async machine is stuck in the topic name and in my repro just for "historical reasons" :).
1. Each state machine has a general event queue. 2. Logically, there is a *per-state* deferred event queue, which stores incoming events that are deferred while the machine resides in a given state. Whenever a state is left, all deferred events for that state are moved into the general event queue. 3. As the last step of state_machine::process_event, all events in the general event queue are processed.
IIUC, what you're observing can be explained with the 3 points above, right?
Right. To be more exact, it's because in (2) "all deferred events for that state" are *enqueued* into the general event queue, i.e. pushed in fifo manner. Although, intuitively I'd say that these deferred events have "higher priority" than those in the queue, because they were posted earlier for sure. So this is exactly the question: wasn't it more appropriate to push them to front of the "queue" (well, not real queue then)?
In your example, When s1 is left, instances of ev3to4_1 and ev3to4_2 are moved from the deferral queue to the general event queue, although the next state does nothing else but defer them also. IMO, the root of the problem lies here. A better solution would be to put both s1 and s2 into an outer state (e.g. s) and have that defer ev3to4_1 and ev3to4_2. s1 would then only defer ev2to3 and transition to s2. Similarly, s2 would then only transition to s3.
It's a nice workaround for this specific case (which is merely a minimal reproduction), but if the main idea behind it is to avoid multiple deferrals of the same event, then I'm afraid it would be quite complicated to apply it to my real FSM, because it has rather long "pipeline" of states, some of them are already nested. It looks like this (indentations mean substates): struct Disconnected; struct Connecting; struct Connected; --struct Unauthenticated; --struct Authenticating; --struct Authenticated; ----struct Stopped; ----struct Starting; ----struct Started; ...and so on... The main events (are queued in this order): struct EvConnect; struct EvAuthenticate; struct EvStart; struct EvGetData; EvConnect moves Disconnected-->Connecting, EvAuthenticate moves Unauthenticated-->Authenticating, and so on. These transitions generate some actions, and later "acknowledging" events come: EvConnect::Ok advances Connecting-->Connected, EvConnect::Fail rolls back Connecting-->Disconnected, and so on. So if everything goes well, EvGetData should be deferred until Started state, where it's processed. OTO, if some failure occurs, EvGetData would be processed as "error" in one of "stable" states (Disconnected, Unauthenticated, etc) - it cannot be silently ignored for some reasons. This means that EvGetData should never appear in the event queue before the events, which must precede it, because then it would be processed immediately as "error". Unfortunately, this is exactly what happens now: an "acknowledging" event enters between EvStart and EvGetData (which is legitimate), causes transition Authenticating-->Authenticated, and the 2 events come reversed to Stopped state. I'm really sorry to bother you with my FSM details, but maybe you can see some obvious workaround that I miss:). Thank you!
Sorry, this will have to wait until the weekend. Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.
Hi Igor Sorry for the delay.
Right. To be more exact, it's because in (2) "all deferred events for that state" are *enqueued* into the general event queue, i.e. pushed in fifo manner. Although, intuitively I'd say that these deferred events have "higher priority" than those in the queue, because they were posted earlier for sure. So this is exactly the question: wasn't it more appropriate to push them to front of the "queue" (well, not real queue then)?
I don't know whether that will be more in the spirit of the UML standard. I quote from UML Superstructure Specification, v2.3, page 569 http://www.omg.org/spec/UML/2.3/Superstructure/PDF/: <quote> Deferred events A state may specify a set of event types that may be deferred in that state. An event that does not trigger any transitions in the current state, will not be dispatched if its type matches one of the types in the deferred event set of that state. Instead, it remains in the event pool while another non-deferred event is dispatched instead. This situation persists until a state is reached where either the event is no longer deferred or where the event triggers a transition. </quote> So the standard says that deferred events should stay in the queue until they are no longer deferred. I think it is clear that my implementation fails to satisfy the standard under the circumstances you pointed out. While your suggestion (to move the deferred events to the front of the general queue) would probably work in your case, it seems it will not satisfy the standard when both outer and inner states defer events. Problem is, I currently don't see how the standard can be implemented in an *efficient* manner with the current interface. I'll put some more thought into this. In the mean time, I'd suggest to use the workaround I outlined in my previous post. I'll post again in this thread when I have a workable solution. Of course, if you have a suggestion how the implementation can be fixed, I'd be very happy to hear it. Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.
Hi,
I don't know whether that will be more in the spirit of the UML standard. I quote from UML Superstructure Specification, v2.3, page 569 http://www.omg.org/spec/UML/2.3/Superstructure/PDF/:
<quote> Deferred events
A state may specify a set of event types that may be deferred in that state. An event that does not trigger any transitions in the current state, will not be dispatched if its type matches one of the types in the deferred event set of that state. Instead, it remains in the event pool while another non-deferred event is dispatched instead. This situation persists until a state is reached where either the event is no longer deferred or where the event triggers a transition. </quote>
So the standard says that deferred events should stay in the queue until they are no longer deferred. I think it is clear that my implementation fails to satisfy the standard under the circumstances you pointed out. While your suggestion (to move the deferred events to the front of the general queue) would probably work in your case, it seems it will not satisfy the standard when both outer and inner states defer events.
Problem is, I currently don't see how the standard can be implemented in an *efficient* manner with the current interface. I'll put some more thought into this. In the mean time, I'd suggest to use the workaround I outlined in my previous post.
I'll post again in this thread when I have a workable solution. Of course, if you have a suggestion how the implementation can be fixed, I'd be very happy to hear it.
Ok, I see... Thanks a lot for your assistance!
Hi Igor The issue is reproduced with this failing test ... http://svn.boost.org/svn/boost/trunk/libs/statechart/test/DeferralBug.cpp ... which isa simplified version of your original. Can you please verify whether it captures your expectation correctly? I think I'll have a fix for this one by Sunday. If you have additional use cases/expectations for event processing order during deferral, please let me know. Thanks, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.
Hi Andreas,
The issue is reproduced with this failing test ...
http://svn.boost.org/svn/boost/trunk/libs/statechart/test/DeferralBug.cpp
... which isa simplified version of your original. Can you please verify whether it captures your expectation correctly?
Yes, that's it.
I think I'll have a fix for this one by Sunday. If you have additional use cases/expectations for event processing order during deferral, please let me know.
As far as I can see, all my use-cases, where the order of deferred events is "broken", can be reduced to the above one. Thank you very much for your assistance!
I've just checked in a fix into the trunk ... https://svn.boost.org/trac/boost/changeset/66410 ... which passes all tests on msvc-9.0, including the one that exposes this bug. Could you please see whether that fixes your problem? The only difference between the trunk version and 1.44 is this fix (ignoring a few doc and MSVC project updates that should not affect you). Thanks for your report! Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.
I've just checked in a fix into the trunk ...
https://svn.boost.org/trac/boost/changeset/66410
... which passes all tests on msvc-9.0, including the one that exposes this bug. Could you please see whether that fixes your problem? The only difference between the trunk version and 1.44 is this fix (ignoring a few doc and MSVC project updates that should not affect you).
It seems that the fix affected some other points of my FSMs, so they stopped working completely :). I'm now trying to figure out what happens there exactly, and to make a small repro. Anyway, thanks for your effort! Igor'.
I'm now trying to figure out what happens there exactly, and to make a small repro.
Here it is:
#include
reactions; };
struct s3; struct s2 : sc::simple_state< s2, fsm > { typedef mpl::list< sc::transition< ev2to3, s3 >, sc::deferral< ev3to4_1 >, sc::deferral< ev3to4_2 >
reactions; };
struct s4_1; struct s4_2; struct s3 : sc::simple_state< s3, fsm > { typedef mpl::list< sc::transition< ev3to4_1, s4_1 >, sc::transition< ev3to4_2, s4_2 >
reactions; };
struct s4_1 : sc::simple_state< s4_1, fsm > {}; struct s4_2 : sc::simple_state< s4_2, fsm > {}; int test_main( int, char* [] ) { fsm machine; machine.initiate(); machine.process_event( ev3to4_1() ); machine.process_event( ev3to4_2() ); machine.process_event( ev1to2() ); machine.process_event( ev2to3() ); BOOST_REQUIRE( machine.state_cast< const s4_1 * >() != 0 ); return 0; }
I'm now trying to figure out what happens there exactly, and to make a small repro.
Sorry for that. I've expanded the test and it now works for both cases. Please try again: https://svn.boost.org/trac/boost/changeset/66475 Please let me know whether the new version works for you. Thanks & Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.
Sorry for that. I've expanded the test and it now works for both cases. Please try again:
https://svn.boost.org/trac/boost/changeset/66475
Please let me know whether the new version works for you.
Great, it works now. IIUC, this fix will be a part of 1.46 (not 1.45), right? Thank you very much!
IIUC, this fix will be a part of 1.46 (not 1.45), right?
Yes, unfortunately. 1.45 is almost out the door. Thanks & Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.
Hi, After some more thorough testing, it seems that there's another "regression". Unfortunately, I haven't managed to make a simple reproducing test, so in the meanwhile I'll just describe it verbally: I've got a state, which performs post_event() inits destructor; this event is to be processed by an orthogonal state. What I observe is that the event is neither processed by any state, nor "pops up" to unconsumed_event(). But IIUC, there're no more options, are there? Please note that this specific event is *not* deferred by any state - it's always posted from one single state and processed by another orthogonal state. But there're a lot of other deferred events, so my guess is that it's getting stuck somewhere due to the recent changes in the deferral mechanism. Is it possible? In the meanwhile, I worked-around this issue by going up to queue_event: machine.my_scheduler().queue_event(machine.my_handle(), ...); Thanks again!
Hi Igor
But there're a lot of other deferred events, so my guess is that it's getting stuck somewhere due to the recent changes in the deferral mechanism. Is it possible?
My previous attempts at implementing this simple functionality would make a nice text book example on how premature optimization is the root of all evil. An apprentice could have gotten it correct the first time with the straight-forward but seemingly slighly inefficient looking implementation. I really should have known better. Anyway, unless I've missed something again, everything should behave as advertised now (all tests pass). Please test and let me know: https://svn.boost.org/trac/boost/changeset/66496 Thanks! -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.
Anyway, unless I've missed something again, everything should behave as advertised now (all tests pass). Please test and let me know:
Sorry for the delay. I tested my code with the latest StateChart from the trunk, and it seems to work correctly now. Thank you very much!
Sorry for the delay. I tested my code with the latest StateChart from the trunk, and it seems to work correctly now.
Thank you very much!
Thank you for the report! Regards, -- Andreas Huber When replying by private email, please remove the words spam and trap from the address shown in the header.
participants (2)
-
Andreas Huber
-
Igor R