Re: [boost] Looking for thoughts on a new smart pointer:shared_ptr_nonnull

9 Oct 2013

      On 10/10/2013 7:58 AM, Quoth Nevin Liber:
...
When I say one should assert on a detected precondition failure, it's
really shorthand for putting the system into a known state.  A mission
critical system, for instance, might do something vastly more complicated
than aborting.
I just don't see how throwing, which normally involves running code
dependent on state, accomplishes this.
Throwing unwinds out of the current call stack to some higher level 
point of code that has declared that it is prepared to handle errors in 
some meaningful way.

Imagine a server process which has a task loop to process a list of 
pending operations (each of which are completely independent, or can 
gracefully fail if they depend on some previous operation that failed -- 
eg. a webserver or database).  Inside the loop it has a try/catch around 
the call to actually process the operation.

Somewhere in the depths of the operation processing, it hits a code path 
with a bug and a null pointer is encountered where the code did not 
adequately check for it.  One of these things can happen:

1. None of the code paths check for it and the null pointer is actually 
dereferenced.  Depending on the platform, this may immediately terminate 
the server (bad), perform dodgy things to memory (extremely bad), or 
throw an exception (which will terminate the server if uncaught, or 
offer some possible means for recovery).  Tracing a callstack from this 
exception would reveal the place where the pointer is used, which is 
*not* the site of the actual code bug.  (Although it does allow you to 
guess that in most cases.)

2. The assignment to the "this pointer is never null" pointer did 
actually check that it wasn't null and throws an exception.  This 
eliminates the possibility of dereferencing this pointer later (since it 
never really existed), and tracing the callstack reveals the place where 
the pointer is constructed, which *is* the site of the bug.

If we assume that an exception was thrown, then the catch handler in the 
operation loop can mark that operation as failed (and intervening catch 
handlers or destructors could have rolled back anything they needed to). 
  Subsequently it would be able to continue running operations. 
Obviously any dependent on the failed operation would also fail, but 
that can be dealt with too.

As for what happens next, it depends on where the bug actually was and 
how critical that null pointer is.  Maybe it was a temporary failure in 
some local state, so would not affect any other operations at all -- in 
this case taking down the entire server is obviously overkill.  Or maybe 
it was a more serious failure in shared state and all (or most) future 
operations would also fail.  In this case it should become apparent 
quickly to the user and they should be able to do a controlled restart 
of the server and contact someone to fix the underlying problem.  Either 
way there should be a log of why the operation failed that will help to 
discover the source of the bug.

And note that stack unwinding all the way up is still well defined at 
this point -- some people have argued that "oh, you're already in an 
undefined state because you've violated the precondition!" but that's 
circular logic.  If you check for it and throw, it's no longer a 
precondition, and so nothing has been violated and no UB has occurred 
yet.  Throwing prevents UB from occurring later as a result of 
"successfully" constructing an object with an invalid invariant.

I'm not saying that preconditions are bad.  Everything has an implicit 
precondition of "no other code has trampled over my memory" and "the CPU 
and RAM actually work properly" after all.  But a precondition 
represents a case where invalid input is not tested, which allows bugs 
to propagate.  Sometimes this is not avoidable (as I said in an earlier 
post, it's essentially impossible to verify that a pointer isn't 
dangling).  But a null check is quick and easy.  Leaving specific inputs 
as undefined behaviour, when it is cheap to specify that behaviour, is 
just being lazy.

Obviously throwing and catching exceptions doesn't magically fix bugs in 
the code.  (And if done badly, it can make things significantly worse as 
bugs that have already caused damage to shared state are ignored.)  But 
as long as you do it properly, it does allow you to limit the damage 
caused by bugs in the code if the "this is possible but not valid, and 
would leave my object in an unhinged state if I accepted it" cases were 
allowed to continue unchecked.

I'm all for early termination when the developer is around to sort it 
out immediately.  But that's easy too -- just configure a debugger to 
intercept exceptions immediately when thrown.  But instant termination 
is not a good user experience.

Re: [boost] Looking for thoughts on a new smart pointer:shared_ptr_nonnull

Gavin Lambert