Re: [boost] [variant2] Andrzej's review -- design
Going back to variant, in one case, we are defining an empty state, which forces the user to provide explicit handling for that state, *everywhere*. You keep repeating that the user would have to explicitly handle that everywhere. I thought I just showed you that this is not the case. Except for visit, you don't have to explicitly handle the valueless state at all and even in the case of visit, you might be fine with std::visit just throwing an exception, because that is indeed an exceptional circumstance. I don't see this *everywhere* at all. Best Mike
On Wed, Apr 3, 2019 at 11:56 PM Mike via Boost
Going back to variant, in one case, we are defining an empty state, which forces the user to provide explicit handling for that state, *everywhere*. You keep repeating that the user would have to explicitly handle that everywhere. I thought I just showed you that this is not the case. Except for visit, you don't have to explicitly handle the valueless state at all
Maybe you're thinking it's not a lot of work, but introducing a new state in any type has overhead. Every time you touch a variant object, you must consider the possibility of it being valueless. Your program must define behavior for that case. You have to write unit tests for that behavior, because people will be passing valueless variants when they shouldn't, because they can. But what's the upside? Why is the empty state preferable to another, valid state? Either way we must define behavior, but in the latter case the behavior is already defined, there is nothing to do.
On 5/04/2019 08:12, Emil Dotchevski wrote:
But what's the upside? Why is the empty state preferable to another, valid state? Either way we must define behavior, but in the latter case the behavior is already defined, there is nothing to do.
The empty state is preferable to another unspecified state exactly *because* it is obviously different, and thus is guaranteed to be detectable (and will result in exceptions when accessed inappropriately, or executing an intended-no-value-here evaluation path if one is provided). The problem with transitioning to an otherwise valid but unexpected state is that it may end up evaluating unintended code (oh, this variant contains type B, let's do some standard type B handling -- oh but wait, this wasn't the B that we had set ten functions ago, this is some different B, because an assignment to A failed). This is a source of bugs. (Granted, continuing evaluation on a faulted variable is already a bug in itself, but you are making detection of that bug harder by not putting the value into an obviously-faulted state.)
On Thu, Apr 4, 2019 at 4:30 PM Gavin Lambert via Boost < boost@lists.boost.org> wrote:
The problem with transitioning to an otherwise valid but unexpected state is that it may end up evaluating unintended code
This is why I made the analogy with std::vector. Let's say you always put exactly 10 elements in all your vectors, but then you assign one vector to another, and that fails, and now (unexpectedly?) you have an (otherwise valid) vector with fewer than 10 elements. Such is life. This is how the basic guarantee works, you get an unspecified but valid state.
(Granted, continuing evaluation on a faulted variable is already a bug in itself, but you are making detection of that bug harder by not putting the value into an obviously-faulted state.)
I do not think that it is a bug to access an object that is in a valid state. In fact, the reason why the state is defined as valid is so that you can safely work with that object, even after a failure was reported.
On 4/5/19 7:46 PM, Emil Dotchevski via Boost wrote:
On Thu, Apr 4, 2019 at 4:30 PM Gavin Lambert via Boost < boost@lists.boost.org> wrote:
The problem with transitioning to an otherwise valid but unexpected state is that it may end up evaluating unintended code
This is why I made the analogy with std::vector. Let's say you always put exactly 10 elements in all your vectors, but then you assign one vector to another, and that fails, and now (unexpectedly?) you have an (otherwise valid) vector with fewer than 10 elements.
Such is life. This is how the basic guarantee works, you get an unspecified but valid state.
(Granted, continuing evaluation on a faulted variable is already a bug in itself, but you are making detection of that bug harder by not putting the value into an obviously-faulted state.)
I do not think that it is a bug to access an object that is in a valid state. In fact, the reason why the state is defined as valid is so that you can safely work with that object, even after a failure was reported.
I'm guessing "unexpected" could mean the std::vector could have some arbitrary number of elements all filled with random values. I don't see how you can do anything useful with that.
sob., 6 kwi 2019 o 04:55 Larry Evans via Boost
On 4/5/19 7:46 PM, Emil Dotchevski via Boost wrote:
On Thu, Apr 4, 2019 at 4:30 PM Gavin Lambert via Boost < boost@lists.boost.org> wrote:
The problem with transitioning to an otherwise valid but unexpected state is that it may end up evaluating unintended code
This is why I made the analogy with std::vector. Let's say you always put exactly 10 elements in all your vectors, but then you assign one vector to another, and that fails, and now (unexpectedly?) you have an (otherwise valid) vector with fewer than 10 elements.
Such is life. This is how the basic guarantee works, you get an unspecified but valid state.
(Granted, continuing evaluation on a faulted variable is already a bug in itself, but you are making detection of that bug harder by not putting the value into an obviously-faulted state.)
I do not think that it is a bug to access an object that is in a valid state. In fact, the reason why the state is defined as valid is so that you can safely work with that object, even after a failure was reported.
I'm guessing "unexpected" could mean the std::vector could have some arbitrary number of elements all filled with random values. I don't see how you can do anything useful with that.
Exactly. Of course, what you can do is to reset such object: call `v.clear()` or assign a new vector whose state you know. And then you know exactly what state it is in. But that's it. The fact that an object in an unspecified state doesn't crash (or do random things) when reading its value is not comforting at all: trying to read its value is still something wrong. Regards, &rzej;
On Fri, Apr 5, 2019 at 7:55 PM Larry Evans via Boost
On 4/5/19 7:46 PM, Emil Dotchevski via Boost wrote:
On Thu, Apr 4, 2019 at 4:30 PM Gavin Lambert via Boost < boost@lists.boost.org> wrote:
The problem with transitioning to an otherwise valid but unexpected state is that it may end up evaluating unintended code
This is why I made the analogy with std::vector. Let's say you always
put
exactly 10 elements in all your vectors, but then you assign one vector to another, and that fails, and now (unexpectedly?) you have an (otherwise valid) vector with fewer than 10 elements.
Such is life. This is how the basic guarantee works, you get an unspecified but valid state.
(Granted, continuing evaluation on a faulted variable is already a bug in itself, but you are making detection of that bug harder by not putting the value into an obviously-faulted state.)
I do not think that it is a bug to access an object that is in a valid state. In fact, the reason why the state is defined as valid is so that you can safely work with that object, even after a failure was reported.
I'm guessing "unexpected" could mean the std::vector could have some arbitrary number of elements all filled with random values. I don't see how you can do anything useful with that.
The values aren't random, they are guaranteed to be valid. From the point of view of the type system and what a vector is, it is illogical to complain that it has fewer than 10 elements. It is fine for vectors to have fewer than 10 elements. The way I formulated the example makes it feel like there is something wrong with that state, because we always put 10 elements in all vectors and now we find one with fewer than 10. But this has nothing to do with vector. To do this correctly, we need to write a wrapper type for vector that has a size-10 invariant, then define the appropriate safety guarantees for that type.
On 4/6/19 1:53 PM, Emil Dotchevski via Boost wrote:
On Fri, Apr 5, 2019 at 7:55 PM Larry Evans via Boost
wrote: [snip]
I'm guessing "unexpected" could mean the std::vector could have some arbitrary number of elements all filled with random values. I don't see how you can do anything useful with that.
The values aren't random, they are guaranteed to be valid.
By random I didn't mean invalid. For example, if the vector were std::vector<unsigned>, then the unsigned values could be any unsigned in range 0...max_unsigned.
On 06.04.19 02:46, Emil Dotchevski via Boost wrote:
I do not think that it is a bug to access an object that is in a valid state. In fact, the reason why the state is defined as valid is so that you can safely work with that object, even after a failure was reported.
One thing that bothers me about this view is that it is inconsistent with how the language treats uninitialized variables. Every bit pattern of a variable of type 'char' is a valid value. However, reading from an uninitialized variable of type 'char' is undefined behavior. On a physical level, a variable of type char will always have a valid value, but on the conceptual level, it can also have a special "uninitialized" state from which it is legal to assign a new value but not to read the current value. I find that using this conceptual uninitialized state actually makes it /easier/ to reason about the correctness of my code. Any code that depends on the value of an uninitialized variable for its behavior is automatically incorrect. Any code that reads an uninitialized variable, even if the value read does not affect the observable behavior of the code, is automatically incorrect. Just declaring an uninitialized variable is a red warning light that I need to be careful to assign a value to that variable before reading from it. So long as I follow these rules, I never have to worry about my code unexpectedly breaking due to an uninitialized variable having an unexpected value. To me, the "valid but unspecified" state of an object after an exception is thrown from a function with the basic exception guarantee is conceptually very similar to the state of an uninitialized variable. If I catch such an exception while the object is still in scope, that's a red warning light that I need to either assign a new state to the object or allow the object to go out of scope. Anything else risks the same sort of errors as are caused by reading from an uninitialized variable. -- Rainer Deyke (rainerd@eldwood.com)
AMDG On 4/6/19 3:37 AM, Rainer Deyke via Boost wrote:
On 06.04.19 02:46, Emil Dotchevski via Boost wrote:
I do not think that it is a bug to access an object that is in a valid state. In fact, the reason why the state is defined as valid is so that you can safely work with that object, even after a failure was reported.
One thing that bothers me about this view is that it is inconsistent with how the language treats uninitialized variables. Every bit pattern of a variable of type 'char' is a valid value. However, reading from an uninitialized variable of type 'char' is undefined behavior. On a physical level, a variable of type char will always have a valid value, but on the conceptual level, it can also have a special "uninitialized" state from which it is legal to assign a new value but not to read the current value.
I find that using this conceptual uninitialized state actually makes it /easier/ to reason about the correctness of my code. Any code that depends on the value of an uninitialized variable for its behavior is automatically incorrect. Any code that reads an uninitialized variable, even if the value read does not affect the observable behavior of the code, is automatically incorrect. Just declaring an uninitialized variable is a red warning light that I need to be careful to assign a value to that variable before reading from it. So long as I follow these rules, I never have to worry about my code unexpectedly breaking due to an uninitialized variable having an unexpected value.
To me, the "valid but unspecified" state of an object after an exception is thrown from a function with the basic exception guarantee is conceptually very similar to the state of an uninitialized variable. If I catch such an exception while the object is still in scope, that's a red warning light that I need to either assign a new state to the object or allow the object to go out of scope. Anything else risks the same sort of errors as are caused by reading from an uninitialized variable.
That's completely different. The scope in which a variable is uninitialized should always be known statically. Also, uninitialized variables are never really a good thing. It's just that it's sometimes more convenient than initializing them properly at the point of definition, and we can get away with it for trivial types. In Christ, Steven Watanabe
sob., 6 kwi 2019 o 18:10 Steven Watanabe via Boost
AMDG
On 4/6/19 3:37 AM, Rainer Deyke via Boost wrote:
On 06.04.19 02:46, Emil Dotchevski via Boost wrote:
I do not think that it is a bug to access an object that is in a valid state. In fact, the reason why the state is defined as valid is so that you can safely work with that object, even after a failure was reported.
One thing that bothers me about this view is that it is inconsistent with how the language treats uninitialized variables. Every bit pattern of a variable of type 'char' is a valid value. However, reading from an uninitialized variable of type 'char' is undefined behavior. On a physical level, a variable of type char will always have a valid value, but on the conceptual level, it can also have a special "uninitialized" state from which it is legal to assign a new value but not to read the current value.
I find that using this conceptual uninitialized state actually makes it /easier/ to reason about the correctness of my code. Any code that depends on the value of an uninitialized variable for its behavior is automatically incorrect. Any code that reads an uninitialized variable, even if the value read does not affect the observable behavior of the code, is automatically incorrect. Just declaring an uninitialized variable is a red warning light that I need to be careful to assign a value to that variable before reading from it. So long as I follow these rules, I never have to worry about my code unexpectedly breaking due to an uninitialized variable having an unexpected value.
To me, the "valid but unspecified" state of an object after an exception is thrown from a function with the basic exception guarantee is conceptually very similar to the state of an uninitialized variable. If I catch such an exception while the object is still in scope, that's a red warning light that I need to either assign a new state to the object or allow the object to go out of scope. Anything else risks the same sort of errors as are caused by reading from an uninitialized variable.
That's completely different. The scope in which a variable is uninitialized should always be known statically. Also, uninitialized variables are never really a good thing. It's just that it's sometimes more convenient than initializing them properly at the point of definition, and we can get away with it for trivial types.
One clarification about this last sentence, though. If one can write one's function so that the object is initialized to its desired value this is definitely superior to starting with an object with an unspecified value. But if we are in the situation where we need to know the object's address before we can assign it the desired value (such as in reading data from the stream) starting with unspecified value is superior to programmer setting some value like 0 which is intended to be overwritten in the same scope. In case of unspecified value static analyzers can help us detect bugs when we have forgotten to overwrite value in some branch. Regards, &rzej;
On Sat, Apr 6, 2019 at 2:37 AM Rainer Deyke via Boost
On 06.04.19 02:46, Emil Dotchevski via Boost wrote:
I do not think that it is a bug to access an object that is in a valid state. In fact, the reason why the state is defined as valid is so that
you
can safely work with that object, even after a failure was reported.
One thing that bothers me about this view is that it is inconsistent with how the language treats uninitialized variables. Every bit pattern of a variable of type 'char' is a valid value. However, reading from an uninitialized variable of type 'char' is undefined behavior. On a physical level, a variable of type char will always have a valid value, but on the conceptual level, it can also have a special "uninitialized" state from which it is legal to assign a new value but not to read the current value.
I find that using this conceptual uninitialized state actually makes it /easier/ to reason about the correctness of my code. Any code that depends on the value of an uninitialized variable for its behavior is automatically incorrect. Any code that reads an uninitialized variable, even if the value read does not affect the observable behavior of the code, is automatically incorrect. Just declaring an uninitialized variable is a red warning light that I need to be careful to assign a value to that variable before reading from it. So long as I follow these rules, I never have to worry about my code unexpectedly breaking due to an uninitialized variable having an unexpected value.
To me, the "valid but unspecified" state of an object after an exception is thrown from a function with the basic exception guarantee is conceptually very similar to the state of an uninitialized variable. If I catch such an exception while the object is still in scope, that's a red warning light that I need to either assign a new state to the object or allow the object to go out of scope. Anything else risks the same sort of errors as are caused by reading from an uninitialized variable.
To clarify, this discussion is not about initialization. If a variant object fails to initialize, accessing it is UB, just like it is for any other uninitialized object. I get what you're saying, that if an assignment fails, logically you wish to treat the resulting state something like an uninitialized state. The problem is that your program can no longer assume that all objects it works with are valid -- that is, RAII is out the window -- except if the special state is (by definition) a valid state. But this contradicts your wish, because a valid state is nothing like an uninitialized state. It has to be on your mind all the time, because now objects in this state are valid (by definition) and you must define behavior for functions that are handed such objects.
On 06.04.19 21:24, Emil Dotchevski via Boost wrote:
I get what you're saying, that if an assignment fails, logically you wish to treat the resulting state something like an uninitialized state. The problem is that your program can no longer assume that all objects it works with are valid -- that is, RAII is out the window -- except if the special state is (by definition) a valid state. But this contradicts your wish, because a valid state is nothing like an uninitialized state. It has to be on your mind all the time, because now objects in this state are valid (by definition) and you must define behavior for functions that are handed such objects.
You keep using the term "valid" as if its a clear-cut binary distinction. However, I can think of at least four different degrees of validity: 1: Garbage. A variable (of a class type) was not properly constructed and contains complete garbage. It is undefined behavior to perform any operation on the object, including assignment and destruction. 2: Uninitialized. A variable (of a built-in type) is uninitialized. It is undefined behavior to read the value of this variable, but the variable is "valid" in the sense that you can assign a new value to it and that you can destruct it. 3: Indeterminate. A variable (of a class type) has an indeterminate, semantically meaningless state (after throwing an exception from a member function with the basic exception guarantee, or after being pulled from an object pool). It is technically allowed by not semantically meaningful to read the value of this variable, but the variable is "valid" in the sense that you can assign a new value to it and that you can destruct it. 4: Correct. A variable (of any type) is valid and contains a semantically meaningful and correct value. You seem to categorize degrees 3 and 4 as "valid" and degrees 1 and 2 as "invalid". I'm saying that the distinction between degrees 1 and 2 is huge, as is the distinction between degrees 3 and 4, but the distinction between 2 and 3 is relatively small. At the physical level, there is no distinction between degrees 2 and 3, since reading from uninitialized memory doesn't cause any actual problems on actual hardware. At the high conceptual level, there is also no distinction between degrees 2 and 3, because the same set of operations is semantically meaningful for both. The distinction still exists, technically, but I consider any code that takes advantage of this distinction suspect, and I'd love for static code analysis tools to give the same diagnostics for degree 3 as for degree 2. If they cannot, then that's a problem with the tools, not with the idea. -- Rainer Deyke (rainerd@eldwood.com)
AMDG On 4/7/19 2:38 AM, Rainer Deyke via Boost wrote:
On 06.04.19 21:24, Emil Dotchevski via Boost wrote:
I get what you're saying, that if an assignment fails, logically you wish to treat the resulting state something like an uninitialized state. The problem is that your program can no longer assume that all objects it works with are valid -- that is, RAII is out the window -- except if the special state is (by definition) a valid state. But this contradicts your wish, because a valid state is nothing like an uninitialized state. It has to be on your mind all the time, because now objects in this state are valid (by definition) and you must define behavior for functions that are handed such objects.
You keep using the term "valid" as if its a clear-cut binary distinction. However, I can think of at least four different degrees of validity:
1: Garbage. A variable (of a class type) was not properly constructed and contains complete garbage. It is undefined behavior to perform any operation on the object, including assignment and destruction.
2: Uninitialized. A variable (of a built-in type) is uninitialized. It is undefined behavior to read the value of this variable, but the variable is "valid" in the sense that you can assign a new value to it and that you can destruct it.
3: Indeterminate. A variable (of a class type) has an indeterminate, semantically meaningless state (after throwing an exception from a member function with the basic exception guarantee, or after being pulled from an object pool). It is technically allowed by not semantically meaningful to read the value of this variable, but the variable is "valid" in the sense that you can assign a new value to it and that you can destruct it.
4: Correct. A variable (of any type) is valid and contains a semantically meaningful and correct value.
You seem to categorize degrees 3 and 4 as "valid" and degrees 1 and 2 as "invalid". I'm saying that the distinction between degrees 1 and 2 is huge,
I disagree. The rules for (2) regarding construction and destruction are a bit relaxed, but most uses are still undefined behavior.
as is the distinction between degrees 3 and 4, but the distinction between 2 and 3 is relatively small. At the physical level, there is no distinction between degrees 2 and 3, since reading from uninitialized memory doesn't cause any actual problems on actual hardware.
The hardware behavior is irrelevant. It only applies if you assume a simple translation from the source to machine code. This is not a valid assumption due to compiler optimizations.
At the high conceptual level, there is also no distinction between degrees 2 and 3, because the same set of operations is semantically meaningful for both. The distinction still exists, technically, but I consider any code that takes advantage of this distinction suspect, and I'd love for static code analysis tools to give the same diagnostics for degree 3 as for degree 2. If they cannot, then that's a problem with the tools, not with the idea.
The distinction is in fact quite simple: Does the expression: x.foo() have undefined behavior according to the language. C++ itself makes no distinction between (3) and (4). In Christ, Steven Watanabe
On Sun, Apr 7, 2019 at 1:38 AM Rainer Deyke via Boost
On 06.04.19 21:24, Emil Dotchevski via Boost wrote:
I get what you're saying, that if an assignment fails, logically you
wish
to treat the resulting state something like an uninitialized state. The problem is that your program can no longer assume that all objects it works with are valid -- that is, RAII is out the window -- except if the special state is (by definition) a valid state. But this contradicts your wish, because a valid state is nothing like an uninitialized state. It has to be on your mind all the time, because now objects in this state are valid (by definition) and you must define behavior for functions that are handed such objects.
You keep using the term "valid" as if its a clear-cut binary distinction.
Do you see that this is a matter of definition? I use the common definition: "valid" is equivalent to "the type invariants are in place". It seems that by your definition, for an object to be valid, it is not sufficient that the type invariants are in place. Under the common definition, if two states A and B fit the invariant constraints of the type, it is illogical to rank the validity of A and B. Both are perfectly good states, by definition. Stronger: to argue that either A or B is an invalid state is equivalent to arguing that the basic guarantee may leave the program in an invalid state. This is nonsense, the whole point of the basic guarantee is to guarantee that the state is valid.
However, I can think of at least four different degrees of validity:
1: Garbage. A variable (of a class type) was not properly constructed and contains complete garbage. It is undefined behavior to perform any operation on the object, including assignment and destruction.
2: Uninitialized. A variable (of a built-in type) is uninitialized. It is undefined behavior to read the value of this variable, but the variable is "valid" in the sense that you can assign a new value to it and that you can destruct it.
3: Indeterminate. A variable (of a class type) has an indeterminate, semantically meaningless state (after throwing an exception from a member function with the basic exception guarantee, or after being pulled from an object pool). It is technically allowed by not semantically meaningful to read the value of this variable, but the variable is "valid" in the sense that you can assign a new value to it and that you can destruct it.
4: Correct. A variable (of any type) is valid and contains a semantically meaningful and correct value.
You seem to categorize degrees 3 and 4 as "valid" and degrees 1 and 2 as "invalid".
I'm using existing, well established definitions. C++ is not defined in terms of the above degrees of validity, and it does not make a distinction between 3 and 4. If 3 and 4 are the same thing and a new "empty" state is introduced, by definition it must be a valid state, which (I argue) weakens the type invariants and therefore complicates all operations, for no good reason (because a valid state is a valid state).
On 07.04.19 21:40, Emil Dotchevski via Boost wrote:
On Sun, Apr 7, 2019 at 1:38 AM Rainer Deyke via Boost
wrote: You keep using the term "valid" as if its a clear-cut binary distinction.
Do you see that this is a matter of definition?
No. You can use any terminology you want. I'm pointing out that the distinction between "valid" and "invalid", by your definition, may not be the most important distinction to consider.
Stronger: to argue that either A or B is an invalid state is equivalent to arguing that the basic guarantee may leave the program in an invalid state.
I would rather say that the basic guarantee is the minimum guarantee that allows the program to maintain a valid state. It is up to the caller of a function with the basic guarantee to ensure that any broken higher-level invariants are restored.
However, I can think of at least four different degrees of validity:
1: Garbage. A variable (of a class type) was not properly constructed and contains complete garbage. It is undefined behavior to perform any operation on the object, including assignment and destruction.
2: Uninitialized. A variable (of a built-in type) is uninitialized. It is undefined behavior to read the value of this variable, but the variable is "valid" in the sense that you can assign a new value to it and that you can destruct it.
3: Indeterminate. A variable (of a class type) has an indeterminate, semantically meaningless state (after throwing an exception from a member function with the basic exception guarantee, or after being pulled from an object pool). It is technically allowed by not semantically meaningful to read the value of this variable, but the variable is "valid" in the sense that you can assign a new value to it and that you can destruct it.
4: Correct. A variable (of any type) is valid and contains a semantically meaningful and correct value.
You seem to categorize degrees 3 and 4 as "valid" and degrees 1 and 2 as "invalid".
I'm using existing, well established definitions. C++ is not defined in terms of the above degrees of validity, and it does not make a distinction between 3 and 4.
But when writing a program that actually has a function beyond avoiding undefined behavior, the distinction between a variable that contains a correct value and a (valid, by your definition) variable that contains garbage is critical. Using a garbage value for further computation is defined but clearly incorrect behavior. -- Rainer Deyke (rainerd@eldwood.com)
czw., 4 kwi 2019 o 21:13 Emil Dotchevski via Boost
On Wed, Apr 3, 2019 at 11:56 PM Mike via Boost
wrote: Going back to variant, in one case, we are defining an empty state, which forces the user to provide explicit handling for that state, *everywhere*. You keep repeating that the user would have to explicitly handle that everywhere. I thought I just showed you that this is not the case. Except for visit, you don't have to explicitly handle the valueless state at all
Maybe you're thinking it's not a lot of work, but introducing a new state in any type has overhead. Every time you touch a variant object, you must consider the possibility of it being valueless. Your program must define behavior for that case. You have to write unit tests for that behavior, because people will be passing valueless variants when they shouldn't, because they can.
This is you view of the things. Let me offer a different one. I agree with what your words imply: if possible avoid adding new state to the type, also avoid any kind of degenerate or partially formed state. A type is always more robust without these states. But sometimes we cannot afford to follow this rule because of other constraints/objectives we have. Let's consider a pointer: either a raw pointer or unique_ptr or shared_ptr. Not only does it have the degenerate nullptr state, but what is far worse, you get this state *by default*. I think this is the harm that pointers do, as well as other types that use default constructor to form a degenerate or a partially formed state. But even for these types would not go as far as checking in every function that gets or returns one, if it is null. This is what we have preconditions and postconditions for. If you split your function into smaller sub-functions and you wanted to check for it in every single sub-function it would kill your performance. Now imagine an alternative design to a unique_ptr, let's call it unique_ptr2. It differs in the set of constructors: it does not have the default constructor, it doesnt have the constructor taking nullptr_t, and in constructors that take a raw pointer, if a null pointer is passed an exception is thrown. There is no way *initialize* this pointer to a null pointer value, except for using move and copy constructor. The only way the null pointer can get into this type is when unique_ptr2 is moved from. But such case is special: either a move constructor is called on a temporary, and the language semantics guarantee that only destructor will be invoked and no-one else will observe the object, or the programmer explicitly calls std::move(). But in the latter case programmers are already warned that using such moved from object other than destroying or resetting it is dangerous and could corrupt the program. This is because other member functions of the type can have narrow contracts, they may be valid to be called before the move, but invalid after the move. If some function f() returns unique_ptr2, I will not be checking if it is in the special state. I know it is technically possible, but I trust that the author of f() has done her job, and does not do nasty things. Otherwise, if she does nasty things, the value of a unique_ptr2 is the least of my problems. The natural course of action to me is to trust that every party does their part of the contract, and I can just have a global assumptions that objects of this type when passed to or returned from the function are never null. Technically the invariant allows the null pointer, but I get the practical guarantee that I should never be concerned with it. Someone could say, "but people will be returning null unique_ptr2, because they can". But I do not consider this a reason to insert defensive checks everywhere, or apply strange modifications to type unique_ptr2. I work under assumption that people write code not just because they can, but that they want to achieve some goal. And I assume that this goal is not to corrupt the program. I may get the null pointer from f() if f() has a bug. And it can even lead to UB in the program or a crash. But that bug needs to be fixed inside f(), not by the users of f(). In the similar vein, I would consider it the wrong decision if someone proposed to change unique_ptr2<T>, so that in constructor it preallocates on the heap some T, and this T can be used when an object is moved from: instead of nullptr, we assign the preallocated T, and owing to that even after the move our object is never null. It is possible to do it, but the run-time cost and the bizarre logic is not worth the effect of stronger invariant. My point in short: in this discussion we have to distinguish partially-formed states that are easy to construct from those that are only reached in circumstances that already require special attention.
But what's the upside? Why is the empty state preferable to another, valid state? Either way we must define behavior, but in the latter case the behavior is already defined, there is nothing to do.
I do not mind never-empty guarantee in principle. My problem is that providing it involves too much overhead. The costs outweigh the benefit. If there was a way to provide the never-empty guarantee for free, I would be a strong proponent of it. template <class T>
void f( std::vector<T> & x, std::vector<T> const & y ) { x = y; } Above, if the assignment fails, the content of x is unspecified, but it is guaranteed to be a valid vector, containing valid objects (if not empty).
This is the type of code that I consider "requiring special attention". If the function throws it leaves x in a valid but unspecified state. I do not care much about "valid", but my concern is about "unspecified". I would never want anyone to observe the state of this object. Therefore I would rather change the signature of function f() like this: template <class T> std::vector<T> f(std::vector<T> const & y ) { return std::vector<T>{y}; } So that it is guaranteed that upon exception no-one can see the value. Or if such rewrite is impossible, I would make sure that the object referred to by reference `x` is destroyed before any catch-handler tries to stop the stack unwinding. IOW, I consider the code that allows to observe the unspecified (but valid) state the problem: not the unspecified state itself. Regards, &rzej;
On Fri, 5 Apr 2019 at 09:07, Andrzej Krzemienski via Boost < boost@lists.boost.org> wrote:
I do not mind never-empty guarantee in principle. My problem is that providing it involves too much overhead. The costs outweigh the benefit. If there was a way to provide the never-empty guarantee for free, I would be a strong proponent of it.
I think you're right, catering for all kind of errors a bad/sloppy programmer could make is just too expensive. On the other hand, there are people who write code to fly things to the moon, drive autonomous cars etc ... stuff where it is just too expensive for things too fail. I think there could be a satisfactory resolution of the issue(s) you raise by naming boost::variant2 to something else, something like boost::safe_variant, f.e.. This name also conveys the message that this one could be more expensive in some cases [as most programmers would naturally assume that that safety comes at a cost]. It also does away with the problem that boost will have several variants, in addition to the one in the standard and quite a number of other implementations floating around on GitHub and elsewhere. degski -- *Microsoft, please kill Paint3D*
On 5/04/2019 22:06, Andrzej Krzemienski wrote:
I do not mind never-empty guarantee in principle. My problem is that providing it involves too much overhead. The costs outweigh the benefit. If there was a way to provide the never-empty guarantee for free, I would be a strong proponent of it.
I agree with that; as I said elsewhere I would be (relatively) happy to pay that overhead for a strong-guarantee variant. I don't see sufficient value in "never-empty" by itself to justify it. But, maybe others have different lines in the sand.
This is the type of code that I consider "requiring special attention". If the function throws it leaves x in a valid but unspecified state. I do not care much about "valid", but my concern is about "unspecified". I would never want anyone to observe the state of this object. Therefore I would rather change the signature of function f() like this:
template <class T> std::vector<T> f(std::vector<T> const & y ) { return std::vector<T>{y}; }
So that it is guaranteed that upon exception no-one can see the value. Or if such rewrite is impossible, I would make sure that the object referred to by reference `x` is destroyed before any catch-handler tries to stop the stack unwinding.
Good in principle, but this just moves the problem to the caller. Either the caller initialises a new variable to call f(), in which case things are fine -- the return value is never visible. Or the caller assigns an existing variable to call f(), in which case you're back to (almost) the original code. (Granted it will be a copy-construction plus move-assignment instead of a single copy-assignment, but that may not be better.) Despite some people talking recently about assignment being evil, it's a very important part of the language. Otherwise we'd all be using functional languages.
participants (8)
-
Andrzej Krzemienski
-
degski
-
Emil Dotchevski
-
Gavin Lambert
-
Larry Evans
-
Mike
-
Rainer Deyke
-
Steven Watanabe