[smart_ptr] Interest in the missing smart pointer (that can target the stack)
Hi all, I'm new here. Here goes: I propose a new smart pointer called, say "registered_ptr", that behaves just like a native C++ pointer, except that its value is (automatically) set to null_ptr when the target object is destroyed. This kind of smart pointer already exists and is being used in the form of, for example, QPointer and Chromium's WeakPtr (http://doc.qt.io/qt-5/qpointer.html and https://code.google.com/p/chromium/codesearch/#chromium/src/base/memory/weak...), although those implementations only apply to a specific set of object types. There's also std::experimental::observer_ptr (http://en.cppreference.com/w/cpp/experimental/observer_ptr), which seems to be trying to serve a similar purpose, but with none of the functionality. This might be as far as you need to read to decide if you think it's a good idea or not. The rest of this (kind of long) post addresses some of the details and motivation. It's instructive to compare this "registered_ptr" with (std::)shared_ptr. Both have a (modest) performance cost, although registered_ptr's performance cost is significantly lower, as will be discussed later. Both can have a significant "code safety" benefit when used for that purpose - i.e. reduce or eliminate the possibility of invalid access of an object that has already been destroyed. And both have a functional purpose other than safety or convenience (or just being a regular pointer). That functional purpose is to deal with situations when it is not easy to predict in advance (i.e. at compile time) when an object will be referenced (and stop being referenced). shared_ptr deals with these type of situations by ensuring that the object remains "alive" for as long as there are any (owning) references to the object. This assumes that the object has no "curfew". But one can imagine situations where an object needs to be destroyed at or by a certain time or event. If the object is exclusively holding an in demand resource, or security lock for example. In these cases one might prefer that late attempted references to the object fail (safely) rather than have the object's destruction delayed. registered_ptrs might be more suitable for these situations. So from this comparison it's hard to argue for the inclusion of shared_ptr and the omission of registered_ptr in the library. registered_ptr can be thought of as a kind of "non-intrusive" counterpart to shared_ptr. "But wait", you say, "shared_ptr already has a non-intrusive counterpart. Namely weak_ptr." Yes, but the difference between weak_ptr and registered_ptr is that registered_ptr is independent, not just an appendage of shared_ptr. And registered_ptr can point to stack allocated objects (just like native C++ pointers can), but at the cost of "thread safety". Referencing an object on another (asynchronous) thread's stack is inherently not safe, right? Some may be concerned with this lack of thread safety - I am not, I don't think we should be encouraging casual asynchronous object sharing - but the performance dividend of getting to put the target object on the stack rather than the heap more than makes up for it in my opinion. So I've actually already had cause to implement, and use, a version of this "registered_ptr" (I actually called it "TRegisteredPointer"), and in practice it works great. But there are, arguably, a couple of issues. Besides the thread safety (non-)issue, my implementation can only target types that can act as base classes. By default most classes can act as a base class, but native types like int, bool, etc. cannot. In my case, this is not a big issue because TRegisteredPointer is actually part of a small library that includes (safer) substitutes for int, bool and size_t that can act as base classes. The good thing is that it does seem to serve its purpose of not only being safe, but also faster than shared_ptr. Much faster when targeting stack objects as one would expect, but also faster when targeting heap objects according to my simple benchmarks (msvc2013 - Windows 7 - Haswell). My implementation can be found here: https://github.com/duneroadrunner/SaferCPlusPlus. It's in the file called "mseregistered.h". Examples of it in action can be found in the file called "msetl_example.cpp", near the end. No decent documentation yet, but the examples are commented and it should be clear what's going on. So far I've posited that the argument for registered_ptr is at least as good as the argument for shared_ptr. But let me make a further argument for why I think it's particularly important to include registered_ptr. First, let me make an argument for "safe" pointers in general. By safe pointers, I mean pointers that have zero chance of participating in a dereference to invalid memory. This will usually imply that the safe pointer will do a runtime check and throw an exception if an invalid dereference is attempted. Now, I subscribe to the C++ "trust the programmer" philosophy, so I agree that programers should have the option of avoiding these runtime checks. But security (and by extension, language safety) is so important these days that programmers should also have the option of having these runtime checks when desired. So I'm suggesting that all the smart pointers (shared_ptr, unique_ptr, etc.) come in "safe" versions as well. There is already some precedent for safe and unsafe options. For example, std::vector<> provides both the "unsafe" [] operator and the "safe" at() function. And ultimately it needs to be properly measured, but I would think that the performance cost of this kind of runtime check should be quite modest. There is plenty of opportunity for optimizing compilers to strip them out when not needed and of course these sorts of runtime checks are a favorite food of branch predictors in modern cpus right? I haven't put a lot of effort into this, but my simple benchmarks show checked dereferences being only modestly slower than unchecked dereferences. (The "msetl_example.cpp" file includes some simple benchmarks.) So why does this make registered_ptr so important? Because the "safe" version of registered_ptr would be the safe pointer with the lowest performance cost. Perhaps low enough for the safety benefits to be widely considered worth the cost. Of course the ideal would be smart pointers that could distinguish between stack allocated objects and heap allocated objects. Unfortunately, as far as I know, (standard) C++ doesn't seem to give us that facility. If we can't have those, I think we need a smart pointer that can at least accommodate stack allocated objects. Right? Noah
On Thu, Jan 28, 2016 at 12:20 PM, Noah
I propose a new smart pointer called, say "registered_ptr", that behaves just like a native C++ pointer, except that its value is (automatically) set to null_ptr when the target object is destroyed.
the difference between weak_ptr and registered_ptr is that registered_ptr is independent, not just an appendage of shared_ptr.
Do you mean, for example, that you can create a registered_ptr to an instance within the constructor? (I haven't looked at your implementation.)
registered_ptr can point to stack allocated objects (just like native C++ pointers can),
That sounds pretty interesting!
but at the cost of "thread safety". Referencing an object on another (asynchronous) thread's stack is inherently not safe, right? Some may be concerned with this lack of thread safety
I don't have a problem with stating that if you must share an object between threads, you should put that object on the heap and use shared_ptr so it will remain alive as long as any of those threads still need it. I agree with your assertion that it's downright dangerous for thread t1 to have a reference to an object that lives on thread t2's stack.
my implementation can only target types that can act as base classes. By default most classes can act as a base class, but native types like int, bool, etc. cannot.
That could be a bigger problem than it appears, given 'final' classes.
In my case, this is not a big issue because TRegisteredPointer is actually part of a small library that includes (safer) substitutes for int, bool and size_t that can act as base classes.
Um... I'm not sure people would regard use of those substitutes as an acceptable requirement for working with registered_ptr.
Of course the ideal would be smart pointers that could distinguish between stack allocated objects and heap allocated objects. Unfortunately, as far as I know, (standard) C++ doesn't seem to give us that facility.
Heh. I've wished for some time for the ability to constrain a class: "this cannot be static" or "this cannot be instantiated on the stack" or "this cannot be instantiated on the heap." But I've never been sufficiently motivated to write a proposal -- or, come to that, search through WG21 archives for an existing proposal. A good language facility that addressed my use case should also address yours, of course.
On Thu, Jan 28, 2016 at 12:20 PM, Noah
registered_ptr can point to stack allocated objects (just like native C++ pointers can),
shared_ptr can point to stack-allocated objects, too. As a benefit, if you use shared_ptr to point at a stack-allocated object, you also get the weak_ptr functionality. Emil
On 29/01/2016 14:35, Emil Dotchevski wrote:
On Thu, Jan 28, 2016 at 12:20 PM, Noah
wrote: registered_ptr can point to stack allocated objects (just like native C++ pointers can),
shared_ptr can point to stack-allocated objects, too. As a benefit, if you use shared_ptr to point at a stack-allocated object, you also get the weak_ptr functionality.
Yes, you can make a shared_ptr that points at a stack object (by using a null deleter) but AFAIK this doesn't actually work as expected unless you can guarantee that no shared_ptr instances will survive destruction of the stack frame (ie. it is only used within the call chain and never copied outside of it). And it's way too easy to break that guarantee because it's not the semantics that shared_ptr was designed for.
On Thu, Jan 28, 2016 at 5:53 PM, Gavin Lambert
On 29/01/2016 14:35, Emil Dotchevski wrote:
On Thu, Jan 28, 2016 at 12:20 PM, Noah
wrote: registered_ptr can point to stack allocated objects (just like native C++
pointers can),
shared_ptr can point to stack-allocated objects, too. As a benefit, if you use shared_ptr to point at a stack-allocated object, you also get the weak_ptr functionality.
Yes, you can make a shared_ptr that points at a stack object (by using a null deleter) but AFAIK this doesn't actually work as expected unless you can guarantee that no shared_ptr instances will survive destruction of the stack frame (ie. it is only used within the call chain and never copied outside of it).
And it's way too easy to break that guarantee because it's not the semantics that shared_ptr was designed for.
{ foo local; shared_ptr<foo> pl(&local,null_deleter()); .... do_something(p); .... assert(pl.unique()); } Yes, in the presence of exceptions one must also assert(pl.unique()) in a catch(...), and yes, compile-time errors are better than run-time errors, but I wouldn't sacrifice the availability of weak_ptr and the capacity of shared_ptr to act as THE single smart pointer framework in a program. Emil
On 1/28/2016 8:09 PM, Emil Dotchevski wrote:
And it's way too easy to break that guarantee because it's not the semantics that shared_ptr was designed for.
{ foo local; shared_ptr<foo> pl(&local,null_deleter()); .... do_something(p); .... assert(pl.unique()); }
Yes, in the presence of exceptions one must also assert(pl.unique()) in a catch(...), and yes, compile-time errors are better than run-time errors, but I wouldn't sacrifice the availability of weak_ptr and the capacity of shared_ptr to act as THE single smart pointer framework in a program.
Why on earth would you ever do this? There is no shared ownership semantics here at all. I'm going to assume that is supposed to read do_something(pl); do_something can't retain ownership of the shared_ptr without the processor melting so why give it a shared_ptr at all?
On Thu, Jan 28, 2016 at 7:57 PM, Michael Marcin
On 1/28/2016 8:09 PM, Emil Dotchevski wrote:
And it's way too easy to break that guarantee because it's not the
semantics that shared_ptr was designed for.
{ foo local; shared_ptr<foo> pl(&local,null_deleter()); .... do_something(p); .... assert(pl.unique()); }
Yes, in the presence of exceptions one must also assert(pl.unique()) in a catch(...), and yes, compile-time errors are better than run-time errors, but I wouldn't sacrifice the availability of weak_ptr and the capacity of shared_ptr to act as THE single smart pointer framework in a program.
Why on earth would you ever do this?
One reason is to get weak_ptr.
There is no shared ownership semantics here at all.
The point of shared_ptr is to be able to reason that as long as you hold on to a shared_ptr (which you might get by copying another shared_ptr or by locking a weak_ptr), the object will not expire, but you don't hold on to it longer than you need to. This reasoning is perfectly valid within the scope of do_something. At any rate, what other use of null_deleter can you think of? Are you saying that null_deleter makes no sense? Emil
On January 29, 2016 2:48:56 AM EST, Emil Dotchevski
On Thu, Jan 28, 2016 at 7:57 PM, Michael Marcin
wrote: On 1/28/2016 8:09 PM, Emil Dotchevski wrote:
And it's way too easy to break that guarantee because it's not the
semantics that shared_ptr was designed for.
{ foo local; shared_ptr<foo> pl(&local,null_deleter()); .... do_something(p); .... assert(pl.unique()); }
Yes, in the presence of exceptions one must also assert(pl.unique()) in a catch(...), and yes, compile-time errors are better than run-time errors, but I wouldn't sacrifice the availability of weak_ptr and the capacity of shared_ptr to act as THE single smart pointer framework in a program.
Why on earth would you ever do this?
One reason is to get weak_ptr.
There is no shared ownership semantics here at all.
The point of shared_ptr is to be able to reason that as long as you hold on to a shared_ptr (which you might get by copying another shared_ptr or by locking a weak_ptr), the object will not expire, but you don't hold on to it longer than you need to. This reasoning is perfectly valid within the scope of do_something.
If do_something() saves a copy of the shared pointer in a container, for example, later references will refer to a non-existent object. There's nothing you can do about it short of using assertions or another runtime check with a call to std::terminate() or similar. That's hardly ideal.
At any rate, what other use of null_deleter can you think of? Are you saying that null_deleter makes no sense?
I've used it to refer to static objects, but never to automatic variables. ___ Rob (Sent from my portable computation engine)
On 2016-01-29 12:40, Rob Stewart wrote:
On January 29, 2016 2:48:56 AM EST, Emil Dotchevski
wrote: On Thu, Jan 28, 2016 at 7:57 PM, Michael Marcin
wrote: On 1/28/2016 8:09 PM, Emil Dotchevski wrote:
And it's way too easy to break that guarantee because it's not the
semantics that shared_ptr was designed for.
{ foo local; shared_ptr<foo> pl(&local,null_deleter()); .... do_something(p); .... assert(pl.unique()); }
Yes, in the presence of exceptions one must also assert(pl.unique()) in a catch(...), and yes, compile-time errors are better than run-time errors, but I wouldn't sacrifice the availability of weak_ptr and the capacity of shared_ptr to act as THE single smart pointer framework in a program.
Why on earth would you ever do this?
One reason is to get weak_ptr.
There is no shared ownership semantics here at all.
The point of shared_ptr is to be able to reason that as long as you hold on to a shared_ptr (which you might get by copying another shared_ptr or by locking a weak_ptr), the object will not expire, but you don't hold on to it longer than you need to. This reasoning is perfectly valid within the scope of do_something.
If do_something() saves a copy of the shared pointer in a container, for example, later references will refer to a non-existent object. There's nothing you can do about it short of using assertions or another runtime check with a call to std::terminate() or similar. That's hardly ideal.
I think what Emil describes is a special case of a 'dangling_ptr' idiom, if I may call it that way. The point is that there are cases when object lifetime is controlled by a third party (e.g. the stack state, a foreign library, etc.) and you need a safe way to know when the object has been deleted. So you create a shared_ptr with a null_deleter pointing to that object and save it in that object (or another storage associated with that object). You keep only weak_ptrs to that object in all other places in the code. When you need to use the object you have to lock a weak_ptr and thus check if the object is still alive. The example Emil presented is not quite clear to demonstrate the approach because do_something should be receiving weak_ptr rather than shared_ptr.
On January 29, 2016 5:25:49 AM EST, Andrey Semashev
On 2016-01-29 12:40, Rob Stewart wrote:
On January 29, 2016 2:48:56 AM EST, Emil Dotchevski
wrote:
The point of shared_ptr is to be able to reason that as long as you hold on to a shared_ptr (which you might get by copying another shared_ptr or by locking a weak_ptr), the object will not expire, but you don't hold on to it longer than you need to. This reasoning is perfectly valid within the scope of do_something.
If do_something() saves a copy of the shared pointer in a container, for example, later references will refer to a non-existent object. There's nothing you can do about it short of using assertions or another runtime check with a call to std::terminate() or similar. That's hardly ideal.
I think what Emil describes is a special case of a 'dangling_ptr' idiom, if I may call it that way. The point is that there are cases when object lifetime is controlled by a third party (e.g. the stack state, a foreign library, etc.) and you need a safe way to know when the object has been deleted. So you create a shared_ptr with a null_deleter pointing to that object and save it in that object (or another storage associated with that object). You keep only weak_ptrs to that object in all other places in the code. When you need to use the object you have to lock a weak_ptr and thus check if the object is still alive.
I understand that full well. The example that Emil presented can easily lead to having shared_ptrs that refer to release stack memory. ___ Rob (Sent from my portable computation engine)
On Fri, Jan 29, 2016 at 11:24 AM, Rob Stewart
I understand that full well. The example that Emil presented can easily lead to having shared_ptrs that refer to release stack memory.
My original example was: { foo local; shared_ptr<foo> pl(&local,null_deleter()); .... do_something(p); .... assert(pl.unique()); } What if I change it to: int main() { foo local; shared_ptr<foo> pl(&local,null_deleter()); .... do_something(p); .... assert(pl.unique()); } Does it look less scary now? It seems that you guys look at this as "why would do_something take a shared_ptr if it can't keep a copy of it". Being able to retain a copy of the shared_ptr is one reason, yes. The other reason is that do_something needs to make sure that the object doesn't expire before it returns, which in this case it doesn't. Further, what if do_something is declared as: void do_something( weak_ptr<foo> const & ); Now it's not that it'll keep shared_ptr references after it returns, but it'll be creating and destroying shared_ptrs as needed, to make sure that the object doesn't expire while it's being used. Which, again, in this case it doesn't. Emil
On 1/29/2016 3:27 PM, Emil Dotchevski wrote:
int main() { foo local; shared_ptr<foo> pl(&local,null_deleter()); .... do_something(p); .... assert(pl.unique()); }
Does it look less scary now?
No, it still just looks like the coder wants job security to me. I understand that the goal of the snippet is to get a handle to 'local' which can be stored and later invalidated and this code is abusing shared_ptr for that purpose but it's completely wrong. It's wrong for the same reason you don't write operator==() to test for equality and operator!=() to compute prime factorization.
On Sat, Jan 30, 2016 at 9:07 PM, Michael Marcin
On 1/29/2016 3:27 PM, Emil Dotchevski wrote:
int main() { foo local; shared_ptr<foo> pl(&local,null_deleter()); .... do_something(p); .... assert(pl.unique()); }
Does it look less scary now?
No, it still just looks like the coder wants job security to me.
I understand that the goal of the snippet is to get a handle to 'local' which can be stored and later invalidated and this code is abusing shared_ptr for that purpose but it's completely wrong.
See "Smart Pointer Programming Techniques", www.boost.org/doc/libs/release/libs/smart_ptr/sp_techniques.html#static "The same technique works for any object known to outlive the pointer." It's wrong for the same reason you don't write operator==() to test for
equality and operator!=() to compute prime factorization.
Not the same thing at all. Consider this: if( shared_ptr<foo> sp=wp.lock() ) { //use sp } If you put this code inside the do_something function, it won't compute prime factorization, it'd work as expected regardless of whether the object referred to by wp happens to use null_deleter or not. Yes, you can write code that leads to undefined behavior, and yes you shouldn't be using null_deleters left and right, but this is valid technique when using shared_ptr. Emil
On 29/01/2016 23:25, Andrey Semashev wrote:
I think what Emil describes is a special case of a 'dangling_ptr' idiom, if I may call it that way. The point is that there are cases when object lifetime is controlled by a third party (e.g. the stack state, a foreign library, etc.) and you need a safe way to know when the object has been deleted. So you create a shared_ptr with a null_deleter pointing to that object and save it in that object (or another storage associated with that object). You keep only weak_ptrs to that object in all other places in the code. When you need to use the object you have to lock a weak_ptr and thus check if the object is still alive.
The thing is that there is rarely a case in practice where doing this is actually beneficial, unless you have a bit of code that *usually* deals with "real" shared_ptrs in the full shared-ownership sense and you want to exercise them in a context where you know they won't be used concurrently -- eg. unit tests. Typically single-threaded algorithms have a well-defined point at which the object is deleted (if this is a possibility), so it's not necessary to track this separately, and raw pointers/references are sufficient. (And if you're religiously opposed to having raw owning pointers, then use unique_ptr for the owning pointer and references for everything else.) We've already established that making a copy of the shared_ptr to be used outside the call chain (eg. by another thread, or just by this thread in a later operation) is obviously unsafe. Storing a weak_ptr in the single-threaded case is safe but pointless, as it should be obvious whether the object is alive or not during the initial operation (the code that deletes it can just store back a NULL, after all), and it will be guaranteed to be dead after that operation anyway. (This does assume that you have a known place to go to find that weak_ptr, rather than having lots of copies of it.) Storing a weak_ptr in the multi-threaded case is unsafe, because such code expects to be able to promote one to a shared_ptr during some operation and for the underlying object to not be deleted while that operation is in progress, the latter of which is violated when the lifetime is not controlled by the shared_ptr itself. It's that second aspect of weak_ptr -- the ability to promote to a shared_ptr and carry out operations secure in the knowledge that the object won't be surprise-deleted under you by concurrent action -- that many people seem to forget when discussions of wanting to use weak_ptr without shared ownership come up.
On 2016-02-02 04:21, Gavin Lambert wrote:
On 29/01/2016 23:25, Andrey Semashev wrote:
I think what Emil describes is a special case of a 'dangling_ptr' idiom, if I may call it that way. The point is that there are cases when object lifetime is controlled by a third party (e.g. the stack state, a foreign library, etc.) and you need a safe way to know when the object has been deleted. So you create a shared_ptr with a null_deleter pointing to that object and save it in that object (or another storage associated with that object). You keep only weak_ptrs to that object in all other places in the code. When you need to use the object you have to lock a weak_ptr and thus check if the object is still alive.
The thing is that there is rarely a case in practice where doing this is actually beneficial, unless you have a bit of code that *usually* deals with "real" shared_ptrs in the full shared-ownership sense and you want to exercise them in a context where you know they won't be used concurrently -- eg. unit tests.
I also mentioned third party libraries (or APIs in general) which may or may not delete your objects upon calling them. The approach may be the easiest and safest way to know that the object has been deleted. I do agree when all the code in question is under your control it is better to avoid this idiom and use a design with proper ownership semantics. But there are special cases when this trick comes handy.
Typically single-threaded algorithms have a well-defined point at which the object is deleted (if this is a possibility), so it's not necessary to track this separately, and raw pointers/references are sufficient. (And if you're religiously opposed to having raw owning pointers, then use unique_ptr for the owning pointer and references for everything else.)
I'm not opposed to raw pointers, but when dealing with third party API I would not rely on its documented/observed/reverse-engineered behavior because it may change in a later version and may also be difficult to replicate in my code. I just want to know when the actual deletion happens, whenever it does.
We've already established that making a copy of the shared_ptr to be used outside the call chain (eg. by another thread, or just by this thread in a later operation) is obviously unsafe.
It's not unsafe if that thread is guaranteed to not delete the object (e.g. because the thread is blocked).
Storing a weak_ptr in the single-threaded case is safe but pointless, as it should be obvious whether the object is alive or not during the initial operation (the code that deletes it can just store back a NULL, after all), and it will be guaranteed to be dead after that operation anyway. (This does assume that you have a known place to go to find that weak_ptr, rather than having lots of copies of it.)
That's right. Storing back-references to the places that refer to the object is too tedious and intrusive. Remember that the back-references should be managed too.
Storing a weak_ptr in the multi-threaded case is unsafe, because such code expects to be able to promote one to a shared_ptr during some operation and for the underlying object to not be deleted while that operation is in progress, the latter of which is violated when the lifetime is not controlled by the shared_ptr itself.
True. The developer's job is to guarantee that that never happens.
On February 1, 2016 8:21:19 PM EST, Gavin Lambert
I think what Emil describes is a special case of a 'dangling_ptr' idiom, if I may call it that way. The point is that there are cases when object lifetime is controlled by a third party (e.g. the stack state, a foreign library, etc.) and you need a safe way to know when the object has been deleted. So you create a shared_ptr with a null_deleter pointing to
object and save it in that object (or another storage associated with that object). You keep only weak_ptrs to that object in all other
On 29/01/2016 23:25, Andrey Semashev wrote: that places
in the code. When you need to use the object you have to lock a weak_ptr and thus check if the object is still alive.
The thing is that there is rarely a case in practice where doing this is actually beneficial, unless you have a bit of code that *usually* deals with "real" shared_ptrs in the full shared-ownership sense and you want to exercise them in a context where you know they won't be used concurrently -- eg. unit tests.
shared_ptr can be used to manage memory differently than you imagine, it seems. I use shared_ptrs to share ownership between a plugin and the application loading it while using custom deleters to ensure that releasing the last reference means code in the dynamic library releases the memory, if indeed any was allocated. The plugin mechanism release such references before unloading a dynamic library, so all's well. ___ Rob (Sent from my portable computation engine)
On 5/02/2016 22:17, Rob Stewart wrote:
shared_ptr can be used to manage memory differently than you imagine, it seems. I use shared_ptrs to share ownership between a plugin and the application loading it while using custom deleters to ensure that releasing the last reference means code in the dynamic library releases the memory, if indeed any was allocated. The plugin mechanism release such references before unloading a dynamic library, so all's well.
That's fine, and a perfectly reasonable use of a custom deleter. That's not what I was talking about, which was specifically limited to abuse of a null_deleter.
On February 8, 2016 7:10:47 PM EST, Gavin Lambert
shared_ptr can be used to manage memory differently than you imagine, it seems. I use shared_ptrs to share ownership between a plugin and the application loading it while using custom deleters to ensure
releasing the last reference means code in the dynamic library releases the memory, if indeed any was allocated. The plugin mechanism release such references before unloading a dynamic
On 5/02/2016 22:17, Rob Stewart wrote: that library,
so all's well.
That's fine, and a perfectly reasonable use of a custom deleter.
That's not what I was talking about, which was specifically limited to abuse of a null_deleter.
While I didn't mention it, I sometimes use a null deleter, in the same context, for static objects in the dynamic library. ___ Rob (Sent from my portable computation engine)
On 9/02/2016 13:24, Rob Stewart wrote:
On February 8, 2016 7:10:47 PM EST, Gavin Lambert
wrote: shared_ptr can be used to manage memory differently than you imagine, it seems. I use shared_ptrs to share ownership between a plugin and the application loading it while using custom deleters to ensure
releasing the last reference means code in the dynamic library releases the memory, if indeed any was allocated. The plugin mechanism release such references before unloading a dynamic
On 5/02/2016 22:17, Rob Stewart wrote: that library,
so all's well.
That's fine, and a perfectly reasonable use of a custom deleter.
That's not what I was talking about, which was specifically limited to abuse of a null_deleter.
While I didn't mention it, I sometimes use a null deleter, in the same context, for static objects in the dynamic library.
That's more dodgy, unless the library cannot be unloaded before process exit or you can otherwise guarantee that all such shared_ptrs have been deleted prior to unloading the library. It would be better to have something that increments a library refcount as long as the shared_ptr exists, which requires either a custom deleter to decrement the refcount or some property of the object being pointed to doing the same (eg. an embedded shared_ptr to some library stand-in object -- although care needs to be taken with that to avoid cycles). This in turn means that the library can't be unloaded until all the shared_ptrs referencing it have been destroyed, including those promoted from weak_ptrs. I presume you *are* taking such care, given your description above. Again that's different from the case I was discussing.
On Fri, Jan 29, 2016 at 1:40 AM, Rob Stewart
On January 29, 2016 2:48:56 AM EST, Emil Dotchevski < emildotchevski@gmail.com> wrote:
On Thu, Jan 28, 2016 at 7:57 PM, Michael Marcin
wrote: On 1/28/2016 8:09 PM, Emil Dotchevski wrote: The point of shared_ptr is to be able to reason that as long as you hold on to a shared_ptr (which you might get by copying another shared_ptr or by locking a weak_ptr), the object will not expire, but you don't hold on to it longer than you need to. This reasoning is perfectly valid within the scope of do_something.
If do_something() saves a copy of the shared pointer in a container, for example, later references will refer to a non-existent object. There's nothing you can do about it short of using assertions or another runtime check with a call to std::terminate() or similar. That's hardly ideal.
That's hardly a problem if it never happens. :)
At any rate, what other use of null_deleter can you think of? Are you saying that null_deleter makes no sense?
I've used it to refer to static objects, but never to automatic variables.
Static objects do get destroyed, except only God knows in what order, so strictly speaking you can't know that shared_ptr references to them won't exist after they get destroyed. Consider that if you pass a shared_ptr to a function, it can copy it to a global shared_ptr. At least with local objects you can assert on unique(). The point of null deleter is specifically to be able to use shared_ptr in cases when it isn't possible for shared_ptr to control the lifetime of the object. I don't think of the lack of safety as a disadvantage, it's a feature. Emil
On January 29, 2016 3:35:54 PM EST, Emil Dotchevski
On Fri, Jan 29, 2016 at 1:40 AM, Rob Stewart
wrote: On January 29, 2016 2:48:56 AM EST, Emil Dotchevski < emildotchevski@gmail.com> wrote:
If do_something() saves a copy of the shared pointer in a container, for example, later references will refer to a non-existent object. There's nothing you can do about it short of using assertions or another runtime check with a call to std::terminate() or similar. That's hardly ideal.
That's hardly a problem if it never happens. :)
True enough.
At any rate, what other use of null_deleter can you think of? Are you saying that null_deleter makes no sense?
I've used it to refer to static objects, but never to automatic variables.
Static objects do get destroyed, except only God knows in what order, so strictly speaking you can't know that shared_ptr references to them won't exist after they get destroyed.
Also true. Implied in my statement is knowledge, like you apparently implied with do_something(), that the all shared_ptrs will be destroyed before the referenced object is destroyed. In the case I was thinking about, the plugin framework releases the shared_ptr before unloading the dynamic library containing the static object.
Consider that if you pass a shared_ptr to a function, it can copy it to a global shared_ptr. At least with local objects you can assert on unique().
Granted
The point of null deleter is specifically to be able to use shared_ptr in cases when it isn't possible for shared_ptr to control the lifetime of the object. I don't think of the lack of safety as a disadvantage, it's a feature.
I wouldn't go that far, but you're right that it's an example of "trust the programmer." ___ Rob (Sent from my portable computation engine)
On 1/30/2016 2:20 AM, Rob Stewart wrote:
On January 29, 2016 3:35:54 PM EST, Emil Dotchevski
wrote: On Fri, Jan 29, 2016 at 1:40 AM, Rob Stewart
wrote: On January 29, 2016 2:48:56 AM EST, Emil Dotchevski < emildotchevski@gmail.com> wrote:
If do_something() saves a copy of the shared pointer in a container, for example, later references will refer to a non-existent object. There's nothing you can do about it short of using assertions or another runtime check with a call to std::terminate() or similar. That's hardly ideal.
That's hardly a problem if it never happens. :)
True enough.
Wow, shared_ptr really is quite an impressive little data type. But are you guys suggesting that it's already an adequate "smart pointer to stack objects" solution? I dunno, I think the "someone saving a copy of the share_ptr" could be a legitimate problem. For example, someone might write a function that takes a weak_ptr as a parameter and stores a copy of the obtained share_ptr assuming that it points to a heap object. Then someone else might do the "shared_ptr to stack object" thing and then use that function with the inappropriate weak_ptr. Right? I mean I guess technically it's already a problem, but if we endorse "smart pointer to stack objects" as an ordinary programming practice it might exacerbate what is currently a very obscure issue. But why would someone store the obtained shared_ptr rather than the weak_ptr? Well, they might think that accessing an object through a weak_ptr is more costly, and it seems they would be correct: http://duneroadrunner.github.io/SaferCPlusPlus/#simple-benchmarks So I have a couple of questions about shared_ptr's implementation. Would doing the "shared_ptr to stack object" thing still involve a refcount object being allocated on the heap? In which case, you would lose a lot of the (potential) performance benefit of putting the object on the stack in the first place. Right? And also, how does make_shared<> combine the target object and refcount object into a single allocation? My current implementation of registered_ptr does it by just deriving a new object that contains both the target object (as the (public) base class) and the "management" object. This method is nice and simple, but it requires that the target type be able to act as a base class.
On January 30, 2016 4:12:29 PM EST, Noah
On 1/30/2016 2:20 AM, Rob Stewart wrote:
On January 29, 2016 3:35:54 PM EST, Emil Dotchevski
wrote: On Fri, Jan 29, 2016 at 1:40 AM, Rob Stewart
wrote: On January 29, 2016 2:48:56 AM EST, Emil Dotchevski < emildotchevski@gmail.com> wrote:
Wow, shared_ptr really is quite an impressive little data type. But are you guys suggesting that it's already an adequate "smart pointer to stack objects" solution?
We're suggesting that you can use it for that case when you know enough about the code using the shared_ptr to be confident that a reference won't be saved to long. We're also suggesting that doing so permits using shared_ptr our weak_ptr as the parameter type of a function that doesn't save references to the memory.
I dunno, I think the "someone saving a copy of the share_ptr" could be a legitimate problem.
It certainly is a problem, but there's already a lot of rope with which to hang oneself in C++. It's a tool that must be wielded with care.
So I have a couple of questions about shared_ptr's implementation. Would doing the "shared_ptr to stack object" thing still involve a refcount object being allocated on the heap? In which case, you would lose a lot of the (potential) performance benefit of putting the object on the stack in the first place. Right?
Yes to both questions.
And also, how does make_shared<> combine the target object and refcount object into a single allocation?
It uses perfect forwarding, or an emulation of it, to construct the object in the single memory block along with the control block (which includes the reference count). ___ Rob (Sent from my portable computation engine)
On Sat, Jan 30, 2016 at 1:12 PM, Noah
Wow, shared_ptr really is quite an impressive little data type. But are you guys suggesting that it's already an adequate "smart pointer to stack objects" solution?
Some problems don't have a solution, you have to pick the right compromise. The use of shared_ptr has the advantage of enabling the use of weak_ptr and of interfaces that are expressed in terms of shared_ptr and don't care what deleter was used with each instance. The disadvantage is that it could lead to programmer errors, though the assert in my example should detect them.
So I have a couple of questions about shared_ptr's implementation. Would doing the "shared_ptr to stack object" thing still involve a refcount object being allocated on the heap? In which case, you would lose a lot of the (potential) performance benefit of putting the object on the stack in the first place. Right?
With some extra acrobatic moves you can use an allocator (once you have solid evidence from your profiler that this particular heap allocation creates performance problems, though I'd bet the lunch money that in practice that won't ever happen.)
And also, how does make_shared<> combine the target object and refcount object into a single allocation? My current implementation of registered_ptr does it by just deriving a new object that contains both the target object (as the (public) base class) and the "management" object. This method is nice and simple, but it requires that the target type be able to act as a base class.
How it's done is unspecified :) but do note that shared_ptr goes beyond not requiring the target type to be able to act as a base class, it even works with void, e.g. this is valid C++: shared_ptr<void> p(new my_type); //When the last reference expires, ~my_type will be called. Emil
On Sat, Jan 30, 2016 at 2:20 AM, Rob Stewart
On January 29, 2016 3:35:54 PM EST, Emil Dotchevski < emildotchevski@gmail.com> wrote:
The point of null deleter is specifically to be able to use shared_ptr in cases when it isn't possible for shared_ptr to control the lifetime of the object. I don't think of the lack of safety as a disadvantage, it's a feature.
I wouldn't go that far, but you're right that it's an example of "trust the programmer."
OK, don't go that far :) but regardless, we've deduced that my example shows proper use of null_deleter, in fact there is no other use case for null_deleter except to do just this, given that we're in agreement that its use with global objects is not safer than its use with local objects. Emil
Yes, you can make a shared_ptr that points at a stack object (by using a null deleter) but AFAIK this doesn't actually work as expected unless you can guarantee that no shared_ptr instances will survive destruction of the stack frame (ie. it is only used within the call chain and never copied outside of it).
And it's way too easy to break that guarantee because it's not the semantics that shared_ptr was designed for.
I think Gavin is right. Consider this scenario: class CA { public: CA(int x) : m_x(x) {} int m_x; }; CA* a_ptr = nullptr; CA a2_obj(2); { CA a_obj(1); a_ptr = &a_obj; a2_obj = (*a_ptr); } if (a_ptr) { a2_obj = (*a_ptr); // this is bad } else { // this would have been better } A simple demonstration of a potential danger of raw pointers. But using "registered" pointers: mse::TRegisteredPointer<CA> a_ptr; CA a2_obj(2); { mse::TRegisteredObj<CA> a_obj(1); // a_obj is entirely on the stack a_ptr = &a_obj; a2_obj = (*a_ptr); } if (a_ptr) { a2_obj = (*a_ptr); // Fortunately we never got here because a_ptr "knows" its target was destroyed. // In my current implementation, this would have thrown an exception. } else { // Pheww. Better. } I'm not sure that "std::shared_ptrs coerced into targeting the stack" would have helped here.
On 29/01/2016 14:07, Nat Goodspeed wrote:
Do you mean, for example, that you can create a registered_ptr to an instance within the constructor? (I haven't looked at your implementation.)
FWIW, you can do that with enable_shared_from_raw, provided that you can guarantee that someone will create a shared_ptr from the result of the construction. (Otherwise you get a memory leak.)
Heh. I've wished for some time for the ability to constrain a class: "this cannot be static" or "this cannot be instantiated on the stack" or "this cannot be instantiated on the heap." But I've never been sufficiently motivated to write a proposal -- or, come to that, search through WG21 archives for an existing proposal.
You can do "this cannot be instantiated on the stack" fairly easily -- make a private constructor with a static factory method that returns a unique_ptr or shared_ptr. In the latter case, this is also a good way to guarantee that a given class has shared ownership, for use with enable_shared_from_this or as an alternate method for the above to create pointers outside the "real" constructor while still inside the "perceived" constructor.
On 1/28/2016 5:07 PM, Nat Goodspeed wrote:
Do you mean, for example, that you can create a registered_ptr to an instance within the constructor? (I haven't looked at your implementation.)
So first, let me say that I am by no means a C++ (or boost or stl) expert. If I understand your question, the short answer is yes, sort of. But there's a longer answer. You are asking if a registered_ptr can replace the raw pointer in this scenario?: class CA { public: CA() { m_ptr_to_this = this; } CA* m_ptr_to_this; }; CA a_obj; I actually have two implementations - one called TRegisteredPointer and one called TRegisteredPointerForLegacy. TRegisteredPointerForLegacy is a little slower, but more "lenient" / less "strict" / more compatible with raw pointers. With TRegisteredPointerForLegacy, this is no problem: class CA { public: CA() { m_ptr_to_this = this; } mse::TRegisteredPointerForLegacy<CA> m_ptr_to_this; }; mse::TRegisteredObjForLegacy<CA> a_obj; My implementation of TRegisteredPointer is more strict than TRegisteredPointerForLegacy and doesn't (currently) support this. The two implementations represent different tradeoffs between speed & safety and compatibility & flexibility.
my implementation can only target types that can act as base classes. By default most classes can act as a base class, but native types like int, bool, etc. cannot.
That could be a bigger problem than it appears, given 'final' classes.
Umm, yes maybe. Personally, I never use 'final' classes, so my implementation currently uses the target object as a base class. But it could be (and probably will be) extended to also support the target object as the derived class instead. This is the way "QPointer" works if you're familiar with Qt. Basically we need a certain function to be called when the target object is destructed and we don't really care how it's done. We can add a destructor by deriving a new class from the target class, or we can add a destructor by having the target class be derived from a specific class, or frankly, the appropriate code can be manually inserted into the existing destructor of the target class. Either way will work. Using the target class as a base class is just the "cleanest", least "intrusive" of the options. It's true, it doesn't seem that this kind of pointer can be implemented in a way that's as "universal" as, say std::shared_ptr. And maybe that's why boost doesn't already include something like it. But is that sufficient reason to forego the real functional benefits of this kind of pointer?
Um... I'm not sure people would regard use of those substitutes as an acceptable requirement for working with registered_ptr.
Yeah, another shortcoming I would've liked to have avoided if there was some way to do so. But to be clear, you only need to use the substitutes if you're pointing directly at a primitive type. Pointing to a class that contains primitive types is not an issue. How often do people declare a pointer to a single int? I don't think I can remember the last time I did. But maybe that's just me. But anyway, I'm kind of taking a stand on this particular issue, and I invite boost, and the wider C++ community to join me. int, unsigned int, and the other primitives are a legacy inherited from C. A bad (and unnecessarily dangerous) legacy that needs to be tossed, in my opinion. To be clear, registered_ptr doesn't require using *my* substitutes when pointing to primitive types. Presumably any substitute that can act as a base class would work just as well. The substitutes I provide are such a thin wrapper that I assume any respectable compiler should generate the exact same machine code (in release mode) when used as a direct substitute for their native counterparts. But my substitutes are safer in that they have default initialization. And they also address the bug prone implicit conversion between signed and unsigned ints. I'm not against using native primitives in all cases but, in my opinion, they shouldn't still be the default.
Of course the ideal would be smart pointers that could distinguish between stack allocated objects and heap allocated objects. Unfortunately, as far as I know, (standard) C++ doesn't seem to give us that facility.
Heh. I've wished for some time for the ability to constrain a class: "this cannot be static" or "this cannot be instantiated on the stack" or "this cannot be instantiated on the heap." But I've never been sufficiently motivated to write a proposal -- or, come to that, search through WG21 archives for an existing proposal.
A good language facility that addressed my use case should also address yours, of course.
I haven't had time to investigate it, but I came across this project: https://github.com/crdelozier/ironclad/. It seems to be abandoned, but from what I could tell, it used platform specific code to determine if a pointer was pointing to the stack or not.
On January 29, 2016 1:53:54 PM EST, Noah
On 1/28/2016 5:07 PM, Nat Goodspeed wrote:
Basically we need a certain function to be called when the target object is destructed and we don't really care how it's done. We can add a destructor by deriving a new class from the target class, or we can add a destructor by having the target class be derived from a specific class, or frankly, the appropriate code can be manually inserted into the existing destructor of the target class. Either way will work. Using the target class as a base class is just the "cleanest", least "intrusive" of the options.
Have a look at boost::intrusive_ptr.
I'm kind of taking a stand on this particular issue, and I invite boost, and the wider C++ community to join me. int, unsigned int, and the other primitives are a legacy inherited from C. A bad (and unnecessarily dangerous) legacy that needs to be tossed, in my opinion.
[snip]
The substitutes I provide are such a thin wrapper that I assume any respectable compiler should generate the exact same machine code (in release mode) when used as a direct substitute for their native counterparts. But my substitutes are safer in that they have default initialization.
Many times I don't want to initialize a variable because the branches in the subsequent code select the value. Do your wrappers provide a constructor that permits leaving the value uninitialized?
And they also address the bug prone implicit conversion between signed and unsigned ints.
Once you do that, shouldn't you go the rest of the way and check all conversions? For example, what about overflow during marketing narrowing? ___ Rob (Sent from my portable computation engine)
On 1/29/2016 7:00 PM, Rob Stewart wrote:
Have a look at boost::intrusive_ptr.
If I understand boost::intrusive_ptr correctly, and I'm not totally sure that I do, I think it kind of further makes the case for including registered_ptrs. This post http://baptiste-wicht.com/posts/2011/11/boost-intrusive_ptr.html implies the point of boost::intrusive_ptr is increased performance. But it appears that, like shared_ptr, it's meant to be used for heap allocations. But of course, often the biggest performance improvement by far would come from avoiding any heap allocation whatsoever. Which is what (at least one implementation of) registered_ptr can do no problem. Some simple benchmarks here http://duneroadrunner.github.io/SaferCPlusPlus/#simple-benchmarks show that registered_ptrs targeting the stack far outperform even native pointers targeting the heap (when you include the allocation and deallocations). I'm thinking that maybe registered_ptr could be re-branded the "high performance" smart pointer. In which case, one could pose the question as "does it's warts (it's issues with primitive types and 'final' classes) justify excluding the 'high performance' smart pointer"?
Many times I don't want to initialize a variable because the branches in the subsequent code select the value. Do your wrappers provide a constructor that permits leaving the value uninitialized?
So first let me say that I'm not proposing a total ban on primitive types. When you need the performance, and primitive types give you the performance, use them. But that should be small fraction of the world's total C++ code. What is antiquated, in my opinion, is that primitive types are the still the default. In terms of not wanting to initialize due to subsequent conditional assignment, I would say don't underestimate the compiler optimizer. When the optimizer can figure out that the default initialization is redundant, it will remove it for you, right? I should note though, that I found it difficult (or impossible) to fully mimic all the implicit conversion rules of primitive types, so there are going to be some cases where the substitute classes can't be used (without rewriting some of your code) for compatibility reasons.
And they also address the bug prone implicit conversion between signed and unsigned ints.
Once you do that, shouldn't you go the rest of the way and check all conversions? For example, what about overflow during marketing narrowing?
I don't know what "marketing narrowing" is, but recently on this newsgroup people have been discussing a "safe integer" or "safe numerics" library that seems to have taken it all the way, and maybe even a bit further. My types do check ranges when converting to different integer/char types. It's been a while, but if anyone's interested they're implemented here https://github.com/duneroadrunner/SaferCPlusPlus/blob/master/mseprimitives.h and examples of their functionality in action are here https://github.com/duneroadrunner/SaferCPlusPlus/blob/master/msetl_example.c... (search for "mse::CInt"). I am not trying to push my specific primitive substitutes. I'm sure people will (or have) come up with better ones. I chose certain safety-performance-compatibilty-time-effort tradeoffs when implementing my primitive substitutes, but I would certainly defer to anyone else who wants to address the issue. I would say two things though: One - It might be helpful if boost, or whoever, adopted a standard set of primitive substitutes that compilers could recognize and optimize for. And two - By default, an unsigned integer minus another unsigned integer should really return a signed integer, like my primitives do.
AMDG On 01/30/2016 11:31 AM, Noah wrote:
On 1/29/2016 7:00 PM, Rob Stewart wrote:
Many times I don't want to initialize a variable because the branches in the subsequent code select the value. Do your wrappers provide a constructor that permits leaving the value uninitialized?
So first let me say that I'm not proposing a total ban on primitive types. When you need the performance, and primitive types give you the performance, use them. But that should be small fraction of the world's total C++ code. What is antiquated, in my opinion, is that primitive types are the still the default. In terms of not wanting to initialize due to subsequent conditional assignment, I would say don't underestimate the compiler optimizer. When the optimizer can figure out that the default initialization is redundant, it will remove it for you, right?
It's not just about optimization. Initializing a variable with a bogus value is no more correct than leaving it uninitialized, and also prevents tools like valgrind from detecting any real problems. In Christ, Steven Watanabe
On 1/30/2016 11:16 AM, Steven Watanabe wrote:
It's not just about optimization. Initializing a variable with a bogus value is no more correct than leaving it uninitialized, and also prevents tools like valgrind from detecting any real problems.
Good point. I wonder though, does the same argument apply to say, std::vector? I mean, is the default initialization of std::vector to the empty state no more correct than leaving it uninitialized? Should we require programmers to explicitly set the vector state, even if they want to start off with an empty vector? Or is the empty state somehow intrinsically valid, but the zero value for integers is not? If we did a random sample of C++ code on github, what percentage of integers would be initialized to zero? What percent of std::vectors would be initialized to a state other than empty? I actually wonder... Google doesn't seem to know. I actually don't have a strong opinion either way, but it's not obvious to me that the zero value for integers is more bogus than the empty state for vectors. For native integers the language had to make a choice, and for performance reasons, not bug-finding reasons, C chose no default initialization. But when using substitute classes, I don't know if it's an either-or situation. This is just off the top of my head, but let's say the default substitute integer class requires it's value to be set explicitly before use. And let's say it enforces this by throwing an exception, in debug mode, if it's used before explicit initialization. And let's say, for performance reasons, it didn't do any "under-the-hood" default initializations. Let's call this class CBaseInt. But then let's say, some of us would prefer an "under-the-hood" default initialization (to guarantee that the resulting release code was deterministic), but would still want to require explicit initialization before use by the programmer. We could then just publicly derive a class from CBaseInt called CDeterministicBaseInt, and do the default initialization in CDeterministicBaseInt's constructors. And let's say some lazy people don't want to have to explicitly initialize before use. They could just derive a class from CDeterministicBaseInt called CIntForLazyPeople. CIntForLazyPeople could disable the "use before initialization" exceptions by a calling a function provided by CBaseInt. Then you could just use whichever integer class you prefer. Would this satisfy everyone? Is this ideal? So I do accept the notion that requiring explicit initialization before use does help catch and prevent bugs. How many, I don't know. But I don't agree with the idea that all of C++'s language interface should be determined by valgrind's ability to find "use-before-initialization" bugs in debug mode. I am not suggesting that C++'s legacy high-performance language interface be abolished. I'm suggesting that those of us trying to write "secure" and/or high level applications need a different interface that is not encumbered by C's legacy priorities. Specifically, we need the option of primitive types that have the power and flexibility of full fledged classes.
On January 31, 2016 3:59:35 PM EST, Noah
On 1/30/2016 11:16 AM, Steven Watanabe wrote:
It's not just about optimization. Initializing a variable with a bogus value is no more correct than leaving it uninitialized, and also prevents tools like valgrind from detecting any real problems.
Good point.
I wonder though, does the same argument apply to say, std::vector?
It would if vector offered a constructor that was overloaded with a type that signaled the desire for no initialization.
I mean, is the default initialization of std::vector to the empty state no more correct than leaving it uninitialized? Should we require programmers to explicitly set the vector state, even if they want to start off with an empty vector?
That's not what I asked you about. I asked if you provided a way to construct your types without initializing the data. I thought it was obvious that I was asking about a constructor overload taking an instance of, say, uninitialized_t as the mechanism. That is, one would create an instance from an argument named, say, "uninitialized."
Or is the empty state somehow intrinsically valid, but the zero value for integers is not?
Zero may be a valid value with semantics different than "not set." An empty vector doesn't have a magic value that means it's actually empty. ___ Rob (Sent from my portable computation engine)
On 2/1/2016 5:47 PM, Rob Stewart wrote:
I mean, is the default initialization of std::vector to the empty state no more correct than leaving it uninitialized? Should we require programmers to explicitly set the vector state, even if they want to start off with an empty vector?
That's not what I asked you about. I asked if you provided a way to construct your types without initializing the data. I thought it was obvious that I was asking about a constructor overload taking an instance of, say, uninitialized_t as the mechanism. That is, one would create an instance from an argument named, say, "uninitialized."
Sorry, I'm confused. In this post I was responding to what Steve said. I wasn't trying to respond to your question. Although I guess they're related. And it may be my shortcoming, but it actually wasn't obvious to me that you were asking about "a constructor overload taking an instance of, say, uninitialized_t as the mechanism". Is that mechanism preferable to mechanism of having separate, compatible classes I proposed (in both the reply to Steve's post and the reply to your post)? I haven't completely thought it through, but I might've thought that solution of having separate types might be preferable if only for convenience reasons because you wouldn't have to pass any arguments to get an uninitialized variable. And, as Steve pointed out, if you choose to use a type that requires explicit initialization before use, it may help catch some bugs (in debug mode). But now that you've spelled it out for me, having a single class with a separate constructor also seems appealing because then there would definitely be no compatibility issues between separate classes. But then again, there may be issues other than just default initialization to consider. For example, what if we wanted an option to disable run-time range checking when converting between different (sized) integer types? Keeping the one class and adding an extra constructor for that wouldn't really work. Because you'd just be substituting one run-time check for another, right? But another separate compatible class might work. And how about range checking on arithmetic operations? Is that too many options? Should we just not provide for those options?
Or is the empty state somehow intrinsically valid, but the zero value for integers is not?
Zero may be a valid value with semantics different than "not set." An empty vector doesn't have a magic value that means it's actually empty.
Just to be clear, I wasn't being sarcastic or anything. I'm just trying to think this through. So I agree that zero should not be interpreted as a "magic" value indicating "value not set". In my reply to Steve I suggested that the integer class for example, might, in debug mode, contain an extra bool member indicating whether or not the value had been explicitly set. And default initialization would set that bool to indicate that the value hadn't been explicitly set. What I am suggesting is that, perhaps, the std::vector people decided not to provide for uninitialized vector construction because a) they suspected that conditional initialization would be a small portion of all initializations and that the default empty state would be a fairly large portion, and/or possibly b) the compiler optimizer would deal with the redundancy in a significant portion of those conditional initializations, and/or c) the cost of the program being non-deterministic if a programming error slips by is unacceptably high. (I think reason c) is the main one to most of us who desire default initialization.) And it's not obvious to me that those reasons don't apply to integers as well. With respect to the proportion of all initializations that are conditional, the only way we could know that is to do a large random sampling of code on github or whatever. I actually googled to try and find out if anyone had done such a sampling, because I'm actually quite curious what the answer would be. My googling was not successful. I know I'm not always great at expressing my ideas in writing, but if you have time to reread my reply to Steve, in it I suggest a solution in which three separate compatible classes would be available. Each with a different policy for default initialization and requirement for "explicit set before use". Do you still prefer the single class with an extra constructor for "uninitialized construction" solution to the one I proposed?
On February 2, 2016 2:25:42 AM EST, Noah
On 2/1/2016 5:47 PM, Rob Stewart wrote:
I mean, is the default initialization of std::vector to the empty state no more correct than leaving it uninitialized? Should we require programmers to explicitly set the vector state, even if they want to start off with an empty vector?
That's not what I asked you about. I asked if you provided a way to construct your types without initializing the data. I thought it was obvious that I was asking about a constructor overload taking an instance of, say, uninitialized_t as the mechanism. That is, one would create an instance from an argument named, say, "uninitialized."
Sorry, I'm confused. In this post I was responding to what Steve said. I wasn't trying to respond to your question. Although I guess they're related.
I understand, but he was replying to something you wrote in response to my uninitialized query and I hadn't replied to that previously.
And it may be my shortcoming, but it actually wasn't obvious to me that you were asking about "a constructor overload taking an instance of, say, uninitialized_t as the mechanism". Is that mechanism preferable to mechanism of having separate, compatible classes I proposed (in both the reply to Steve's post and the reply to your post)?
There are tradeoffs to those options. Consider things like no-throw new expressions and creating a lock guard that assumes ownership rather than acquiring its lock. Those use arguments to differentiate them from other uses, so there's precedent for the approach.
I haven't completely thought it through, but I might've thought that solution of having separate types might be preferable if only for convenience reasons because you wouldn't have to pass any arguments to get an uninitialized variable.
You have to remember a second name either way. Once constructed, the objects will be the same or very similar in behavior.
And, as Steve pointed out, if you choose to use a type that requires explicit initialization before use, it may help catch some bugs (in debug mode).
I see no conflict. [snip]
there may be issues other than just default initialization to consider. For example, what if we wanted an option to disable run-time range checking when converting between different (sized) integer types? Keeping the one class and adding an extra constructor for that wouldn't really work. Because you'd just be substituting one run-time check for another, right?
If you want that kind of control, then you want policy classes, which means different types in the end.
But another separate compatible class might work. And how about range checking on arithmetic operations? Is that too many options? Should we just not provide for those options?
Policies
Or is the empty state somehow intrinsically valid, but the zero value for integers is not?
Zero may be a valid value with semantics different than "not set." An empty vector doesn't have a magic value that means it's actually empty.
Just to be clear, I wasn't being sarcastic or anything. I'm just trying to think this through.
I didn't think otherwise.
So I agree that zero should not be interpreted as a "magic" value indicating "value not set". In my reply to Steve I suggested that the integer class for example, might, in debug mode, contain an extra bool member indicating whether or not the value had been explicitly set. And default initialization would set that bool to indicate that the value hadn't been explicitly set.
That's certainly possible.
What I am suggesting is that, perhaps, the std::vector people decided not to provide for uninitialized vector construction because a) they suspected that conditional initialization would be a small portion of all initializations and that the default empty state would be a fairly large portion, and/or possibly b) the compiler optimizer would deal with the redundancy in a significant portion of those conditional initializations, and/or c) the cost of the program being non-deterministic if a programming error slips by is unacceptably high.
Actually, vector would be quite dangerous to use without default initialization to establish its invariants. The same can't be said for an integer, though using the garbage value might be dangerous in some contexts.
Do you still prefer the single class with an extra constructor for "uninitialized construction" solution to the one I proposed?
It's hard to say. Adding a separate class or policy just to have the uninitialized case is heavy, but the overload may not apply to all specializations of the template. ___ Rob (Sent from my portable computation engine)
On 2/2/2016 1:40 AM, Rob Stewart wrote:
What I am suggesting is that, perhaps, the std::vector people decided not to provide for uninitialized vector construction because a) they suspected that conditional initialization would be a small portion of all initializations and that the default empty state would be a fairly large portion, and/or possibly b) the compiler optimizer would deal with the redundancy in a significant portion of those conditional initializations, and/or c) the cost of the program being non-deterministic if a programming error slips by is unacceptably high.
Actually, vector would be quite dangerous to use without default initialization to establish its invariants. The same can't be said for an integer, though using the garbage value might be dangerous in some contexts.
Yeah, you're right. But how rare are those contexts? I mean it's not rare for an integer to be used as an index into an array. If you're saying that you agree with the decision to enforce mandatory default initialization of std::vectors, but uninitialized construction is ok for integers because the most catastrophic consequences would happen a lower percentage of the time, I dunno, it seems to me this argument is a judgement call that depends on how much lower that percentage is, and the magnitude of the real world benefits of foregoing default initialization. Maybe someone could do a sampling of code on github or whatever and actually measure those two factors, but until then I guess everybody has their own estimate of which factor outweighs the other. But I do want to point out the potentially high cost of not knowing if your code is deterministic. Consider some hypothetical online banking software written in C++. And let's say that a programmer accidentally forgot to set the value of an int before use, but just by luck the software behaved reliably in an acceptable manner on the tested and deployed environment. So after rigorous testing and a year or whatever of deployment, this software can be considered to be "well-tested" in it's deployed environment. But then let's say they want to upgrade their servers. Maybe with a different OS or OS version. Maybe the code is recompiled. When deployed on the new server environment, that uninitialized integer may get initialized to a different value (or range of values) and so you could not be so confident that it would continue to behave the same way. So it loses a lot of it's "well-tested" status. And that can be costly. So ideally what I would like is for boost to provide a set of types that when used in lieu of the regular C++ types, ensure that the code is deterministic. Or as deterministic as possible. Because being deterministic increases the value and effectiveness of testing.
Do you still prefer the single class with an extra constructor for "uninitialized construction" solution to the one I proposed?
It's hard to say. Adding a separate class or policy just to have the uninitialized case is heavy, but the overload may not apply to all specializations of the template.
Yeah, I'm not sure either. Separate classes does seem heavy. But at least it's compile-time heavy, not run-time. Anyway I wonder if this is all moot. Have you looked at all at the "safe numerics" library proposed in this newsgroup? The post from Jan 20 has links. That library seems use a template to generate classes of "safe" integers or whatever. registered_ptr should have no problem targeting those. I don't know how things work here, but it sounds like it's pretty far along the acceptance process. Interestingly, looking at the comments in their default constructor - constexpr explicit safe_base() { // this permits creating of invalid instances. This is inline // with C++ built-in but violates the premises of the whole library // choice are: // do nothing - violates premise of he library that all safe objects // are valid // initialize to valid value - violates C++ behavior of types. // add "initialized" flag. Preserves fixes the above, but doubles // "overhead" // still pending on this. } - it looks like they haven't decided on whether or not to do default initialization yet either :)
On Tue, Feb 2, 2016 at 6:20 PM, Noah
On 2/2/2016 6:36 PM, Emil Dotchevski wrote:
On Tue, Feb 2, 2016 at 6:20 PM, Noah
wrote:
Emil, yeah I actually saw this a while ago. It was good the second time 'round too. Before C++11, I was so ready to dump C++ for a more modern language unencumbered by C++'s legacy baggage. The only problem is that every language that came along for some reason seemed to choose mandatory garbage collection at the expense of raii. I don't know what the state of D is these days, but at least initially they seemed to be embracing garbage collection as well. I just can't accept that non-deterministic garbage collection is the right answer. But since C++11, it seems C++ may now be powerful enough to provide modern alternatives to it's own problematic legacy language elements. And my library was an attempt to see if this was possible. And so far it seems to be. From slide 28 of his talk about why C++ is not going be fixed: C++: - Too complicated to fix. - Too constrained by legacy code compatibility requirements. - <in bold> No real interest by user community or standardization committee. It seems the only way anyone's going to be interested in adopting safer language elements is to market it as a brand new language :)
On Tue, Feb 2, 2016 at 11:05 PM, Noah
On 2/2/2016 6:36 PM, Emil Dotchevski wrote:
On Tue, Feb 2, 2016 at 6:20 PM, Noah
wrote: Emil, yeah I actually saw this a while ago. It was good the second time 'round too. Before C++11, I was so ready to dump C++ for a more modern language unencumbered by C++'s legacy baggage. The only problem is that every language that came along for some reason seemed to choose mandatory garbage collection at the expense of raii. I don't know what the state of D is these days, but at least initially they seemed to be embracing garbage collection as well. I just can't accept that non-deterministic garbage collection is the right answer. But since C++11, it seems C++ may now be powerful enough to provide modern alternatives to it's own problematic legacy language elements. And my library was an attempt to see if this was possible. And so far it seems to be.
From slide 28 of his talk about why C++ is not going be fixed:
C++: - Too complicated to fix. - Too constrained by legacy code compatibility requirements. - <in bold> No real interest by user community or standardization committee.
It seems the only way anyone's going to be interested in adopting safer language elements is to market it as a brand new language :)
Can't make C++ less messy or less complicated or more safe without breaking it. If you're looking to avoid the possibility of undefined behavior, C++ is not the language for you. Emil
On February 3, 2016 3:19:15 PM EST, Emil Dotchevski
Can't make C++ less messy or less complicated or more safe without breaking it. If you're looking to avoid the possibility of undefined behavior, C++ is not the language for you.
We regularly eschew aspects of the language in favor of safer alternatives. Smart pointers are a prime example of that. Having Robert's safe integers, or Noah's classes, is a similar tool in the box. The question is how safe they should be and whether they should offer ways to forego that safety when desired or needed. ___ Rob (Sent from my portable computation engine)
On February 2, 2016 9:20:11 PM EST, Noah
On 2/2/2016 1:40 AM, Rob Stewart wrote:
using the garbage value might be dangerous in some contexts.
Yeah, you're right. But how rare are those contexts? I mean it's not rare for an integer to be used as an index into an array.
If you're saying that you agree with the decision to enforce mandatory default initialization of std::vectors, but uninitialized construction is ok for integers because the most catastrophic consequences would happen a lower percentage of the time, I dunno, it seems to me this argument is a judgement call that depends on how much lower that percentage is, and the magnitude of the real world benefits of foregoing default initialization.
It's a lot simpler than that: Trust the Programmer. That's been part of C and C++ from the start. By all means default construct with zero-initialization and provide a converting constructor from numeric types. That will provide the safety you're after. However, rather than prevent a not uncommon use case, just make that use case possible. [snip example of latent bug using native integer]
So ideally what I would like is for boost to provide a set of types that when used in lieu of the regular C++ types, ensure that the code is deterministic. Or as deterministic as possible. Because being deterministic increases the value and effectiveness of testing.
Of course.
Adding a separate class or policy just to have the uninitialized case is heavy, but the overload may not apply to all specializations of the template.
Yeah, I'm not sure either. Separate classes does seem heavy. But at least it's compile-time heavy, not run-time.
I meant that users have to know about more types and how to select among them or infer a programmer's intent from their use.
Anyway I wonder if this is all moot. Have you looked at all at the "safe numerics" library proposed in this newsgroup? The post from Jan 20 has links. That library seems use a template to generate classes of "safe" integers or whatever.
I have. I alluded to those when I mentioned the idea that your classes should do more for conversions, etc. ___ Rob (Sent from my portable computation engine)
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Rob Stewart Sent: 04 February 2016 10:46 To: boost@lists.boost.org Subject: Re: [boost] [smart_ptr] Interest in the missing smart pointer (that can target the stack)
On February 2, 2016 9:20:11 PM EST, Noah
wrote: On 2/2/2016 1:40 AM, Rob Stewart wrote:
It's a lot simpler than that: Trust the Programmer. That's been part of C and C++ from the start.
And that was a BIG mistake. Most programmers can't be trusted a millimeter (especially me).
By all means default construct with zero-initialization and provide a converting constructor from numeric types. That will provide the safety you're after. However, rather than prevent a not uncommon use case, just make that use case possible.
Surely, the root cause is failure to use *hardware* to detect 'failure-to-initialize' and 'out-of-range'. That's the only way this can be done efficiently. But the C/C++ languages designed that out - no access to status flags and no proper vectors/matrices. (and so the hardware hasn't been encouraged to do it well). So C/C++ like software is doomed to be inefficient and/or dodgy. That Robert has got few visible users suggests that most ordinary programmers just don't care. (There are a subset that have their own small world with rules and programmer checkers to reduce the risks). But I applaud his dogged persistence and support anything to get wider use. Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830
On 2/4/2016 2:45 AM, Rob Stewart wrote:
Adding a separate class or policy just to have the uninitialized case is heavy, but the overload may not apply to all specializations of the template.
Yeah, I'm not sure either. Separate classes does seem heavy. But at least it's compile-time heavy, not run-time.
I meant that users have to know about more types and how to select among them or infer a programmer's intent from their use.
Oh, I see. The mental tax. Hmm, I don't know. To you the extra constructor seems conceptually simpler, but to me it was less intuitive than the solution of separate classes. Relative "heaviness" might be subjective. On the safe numerics thread they pointed out that adding a separate constructor actually changes the interface to be incompatible with the interface of native types. And they didn't seem to like that.
On January 30, 2016 1:31:09 PM EST, Noah
On 1/29/2016 7:00 PM, Rob Stewart wrote:
Have a look at boost::intrusive_ptr.
If I understand boost::intrusive_ptr correctly, and I'm not totally sure that I do
You missed, and snipped, the context. You were discussing ways to inject your logic. intrusive_ptr uses free functions, found via ADL, to manage the reference count. That approach could work for you.
Many times I don't want to initialize a variable because the branches in the subsequent code select the value. Do your wrappers provide a constructor that permits leaving the value uninitialized?
So first let me say that I'm not proposing a total ban on primitive types. When you need the performance, and primitive types give you the performance, use them. But that should be small fraction of the world's total C++ code.
Okay, but I was asking whether you provide for that case.
What is antiquated, in my opinion, is that primitive types are the still the default. In terms of not wanting to initialize due to subsequent conditional assignment, I would say don't underestimate the compiler optimizer. When the optimizer can figure out that the default initialization is redundant, it will remove it for you, right?
You also can't assume that the optimizer will recognize such things.
I should note though, that I found it difficult (or impossible) to fully mimic all the implicit conversion rules of primitive types, so there are going to be some cases where the substitute classes can't be used (without rewriting some of your code) for compatibility reasons.
That could prove to be a stumbling block, but you can propose your ideas.
And they also address the bug prone implicit conversion between signed and unsigned ints.
Once you do that, shouldn't you go the rest of the way and check all conversions? For example, what about overflow during marketing narrowing?
I don't know what "marketing narrowing" is,
I don't either. (Actually, it was my attempt to Swype "narrowing", which was interpreted as "marketing", but I didn't fix it correctly.)
but recently on this newsgroup people have been discussing a "safe integer" or "safe numerics" library that seems to have taken it all the way, and maybe even a bit further. My types do check ranges when converting to different integer/char types.
That's what I was alluding to.
By default, an unsigned integer minus another unsigned integer should really return a signed integer, like my primitives do.
I understand what you're trying to do, but that's a narrowing conversion. The signed type may not be large enough to hold the difference. ___ Rob (Sent from my portable computation engine)
On 1/30/2016 6:50 PM, Rob Stewart wrote:
On January 30, 2016 1:31:09 PM EST, Noah
wrote: On 1/29/2016 7:00 PM, Rob Stewart wrote:
Have a look at boost::intrusive_ptr.
If I understand boost::intrusive_ptr correctly, and I'm not totally sure that I do
You missed, and snipped, the context. You were discussing ways to inject your logic. intrusive_ptr uses free functions, found via ADL, to manage the reference count. That approach could work for you.
Oh yeah, sorry about that. I did get your point, and it was relevant. I don't think it would be hard at all to provide the analogous version of intrusive_ptr for registered_ptr. In fact, the "registered_intrusive_ptr" or whatever would probably very closely resemble the original registered_ptr, the only difference really being that instead of the "management" object (technically it's not exactly a "refcount" object) being a member of an object derived from the target, it would be a member of the object itself. The "management" object could probably be reused unmodified. The target object would also need change it's "operator&" to return a smart pointer instead of native one. It just struck me that the point of intrusive pointer was performance, and it was using the same technique as registered_ptr (and make_shared) to do it. Namely, store the refcount/management object with the target object eliminating the need for a separate allocation. In fact, doesn't make_shared obviate the point of intrusive_ptr? But of course registered_ptr takes it a step further and allows the entire allocation to occur on the stack.
Many times I don't want to initialize a variable because the branches in the subsequent code select the value. Do your wrappers provide a constructor that permits leaving the value uninitialized?
So first let me say that I'm not proposing a total ban on primitive types. When you need the performance, and primitive types give you the performance, use them. But that should be small fraction of the world's total C++ code.
Okay, but I was asking whether you provide for that case.
At the moment my substitute classes do not. But they were not intended as a universal substitute for primitives. They were intended as a substitute for primitives in the cases when language safety is of higher priority than performance.
What is antiquated, in my opinion, is that primitive types are the still the default. In terms of not wanting to initialize due to subsequent conditional assignment, I would say don't underestimate the compiler optimizer. When the optimizer can figure out that the default initialization is redundant, it will remove it for you, right?
You also can't assume that the optimizer will recognize such things.
I would agree that if we were deprecating native types, it would not be appropriate for their replacements to have this built in theoretical performance penalty. But we're not deprecating native types. I'm just hoping to replace them as the default. Anyway, I am not opposed to providing multiple versions of the primitive replacements that support different performance-safety tradeoffs. The question is what's the best way to keep the multiple versions compatible with each other? At the moment I'm thinking to publicly derive the one with default initialization from the one without. But what if we want to support more versions? Is the public inheritance mechanism general enough? Do these substitute classes really need to be templates? I'll have to think about.
I should note though, that I found it difficult (or impossible) to fully mimic all the implicit conversion rules of primitive types, so there are going to be some cases where the substitute classes can't be used (without rewriting some of your code) for compatibility reasons.
That could prove to be a stumbling block, but you can propose your ideas.
Yeah, so this turns out to be the key issue. Other people have asked why primitive types can't be used as base classes - http://stackoverflow.com/questions/2143020/why-cant-i-inherit-from-int-in-c. It turns out that really the only reason primitive types weren't made into full fledged classes is that they inherit these "chaotic" conversion rules from C that can't be fully mimicked by C++ classes, and Bjarne thought it would be too ugly to try to make special case classes that followed different conversion rules. That's it. That's the only reason. So if C had reasonably sane conversion rules for primitive types, then primitive types would already be full fledged classes. The problem is that we, the C++ community, are perpetuating our dependency on these crippled and dangerous primitive types by retaining them as the default, and consequently, inadvertently, writing code that depends on their inane conversion rules. So we need to stop doing that. Stop writing code that requires these legacy conversion rules to work. So what needs to happen is that boost, or whoever, adopt an "official" set of primitive substitute classes so that people can, if they choose, write their code and libraries to be compatible with both the old primitives and the new substitute classes with more sane conversion rules. This would not require any extra work on boost's (or whoever's) part. They wouldn't have to make all their libraries support these new classes. They just need to designate a common interface that people can, if they choose, standardize on. An interface that can be implemented by classes (unlike the interface of primitive types). Once this happens, then people will be free to, if they want, re-implement the interface however they choose. This should solve the contention between the performance obsessed and the safety obsessed crowds. Because of it's legacy, C++ has already demonstrated it's power as a language for developing high-performance applications. Some of that same power could be directed at making applications safer and more secure as well. So far it has not been. C++ has not demonstrated it's power as a language for safe and secure applications. I believe the primary reason for this is the lack of even reasonably safe building blocks to work with. And the only reason we don't have them is because of those legacy conversion rules. <hyperbolic exaggeration for effect> I mean registered_ptr may be one of the fastest safe reference types in existence and you're telling me that its show-stopper flaw is that it can't directly target a data type that was designed in the 1970s? That instead I should just be happy with native pointers? Probably the single most dangerous data type on the planet? Really? I mean let's say I want to write an internet facing application and I want to reduce as much as possible the likelihood of a "remote execution" vulnerability? Performance being a secondary (or tertiary) consideration. I guess the default answer is to use Java. But I get the feeling that C++ is now powerful enough that it should be better able to address the task than even Java. But I think first C++ has to demonstrate that it's now powerful enough that applications can be, if desired, practically implemented without using elements that can reference invalid memory or have values determined by random bits of uninitialized memory. Right? Is that too much to ask?
By default, an unsigned integer minus another unsigned integer should really return a signed integer, like my primitives do.
I understand what you're trying to do, but that's a narrowing conversion. The signed type may not be large enough to hold the difference.
I really should have said "size_t" instead of unsigned integer, because that's what I mean. Even though size_t is often implemented as an unsigned integer, it implies that it is being used as count of quantity rather than a set of bits. With size_t, the wrap around bug is a real world problem. I've encountered it several times in real life. (And not just my own code :) The overflow due to narrowing would rarely, if ever, occur in real life.
participants (10)
-
Andrey Semashev
-
Emil Dotchevski
-
Gavin Lambert
-
Michael Marcin
-
Nat Goodspeed
-
Noah
-
Paul A. Bristow
-
Rob Stewart
-
Seth
-
Steven Watanabe