[review][fixed_string] FixedString review starts today
The Boost formal review of Krystian Stasiowski and Vinnie Falco's FixedString library starts today November 25 and extends up to December 4. Description FixedString provides a dynamically resizable string of characters with compile-time fixed capacity and contiguous embedded storage in which the characters are placed within the string object itself. Its API closely resembles that of std::string. Repo and docs GitHub repository: https://github.com/18/fixed_string Online docs: https://18.github.io/doc/fixed_string Review questions Here are some questions you might want to answer in your review: - What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain? And finally, every review should answer this question: - Do you think the library should be accepted as a Boost library? I invite you to submit your review, either publicly to the developers or the users lists, or else, if for whatever reason you don't want to go public, directly to me. Thank you for investing your time on reviewing Krystian and Vinnie's work and for contributing to the advancement of Boost. Best regards, Joaquín M López Muñoz Review manager for candidate Boost.FixedString
On Sun, Nov 24, 2019 at 5:00 PM Joaquin M López Muñoz via Boost-users < boost-users@lists.boost.org> wrote:
The Boost formal review of Krystian Stasiowski and Vinnie Falco's FixedString library starts today November 25 and extends up to December 4.
[snip]
Review questions
Here are some questions you might want to answer in your review: - What is your evaluation of the design?
A drop-in replacement for std::string should not add elements to the design (string_view_type, return type of substr(), etc.), as users will come to depend upon them, and then if they want to switch back to std::string, their code will not compile. I personally do not care about the drop-in replacement goal, but adding things to the design not present in std::string subverts that somewhat. Throwing when the capacity would be exceeded is wrong. The distinction between a failure mode that should throw and one that should assert/invoke UB is that the exception allows the user to recover, and the assert simply informs the user that their code is broken. Since the capacity is known statically, and since there is no way to change it at runtime, reporting this as a recoverable error, rather than a programmer error, is inappropriate. I know this is at odds with the drop-in replacement goal, but using a type that has at most N characters of stack in place of a type that can use nearly all available heap will never truly be a drop-in replacement. Besides, that ship sailed when you changed the return type of substr(). Could you explain why the fixed_string API is not constexpr? Is it for C++11 compatability? Could you also explain how this meets the constexpr use case goal? Would it be unreasonable to add op+ back in for operands of the same type? There's no difficulty determining the size of fixed_string<8>("foo") + fixed_string<8>("bar"), right? That's no different from fixed_string<8>("foo") + "bar", which *is* supported. If capacity() were constexpr static, you would not need static_capacity. I prefer the name "static_string", in keeping with the long-established "static_vector" from Boost.Container. to_fixed_string() takes any integral type. This is wrong. Here are the overloads for to_string() from the standard: string to_string(int val); string to_string(unsigned val); string to_string(long val); string to_string(unsigned long val); string to_string(long long val); string to_string(unsigned long long val); string to_string(float val); string to_string(double val); string to_string(long double val); Note the presence of to_string() overloads for float, double, and long double. Note the conspicuous absence of overloads for char, char8_t, char16_t, and char32_t. This is intentional, and to_fixed_string() should follow this. It's fine to use the current implementation of to_fixed_string() as a common implementation for the integral overloads above, but you should not accept the character types, and should accept the floating point types above. - What is your evaluation of the implementation?
Please consider adding BOOST_ASSERTs of preconditions. For instance,
back() may access a character at an index >= size(), but < capacity().
This will never be caught by ASan or Valgrind, and will not segfault,
because a random char within an array of char is always ok to read, within
the lifetime of the array. An assert will let users find those errors
*much* more easily.
Will the optional dependency on Boost become a permanent dependency on
Boost if the library is accepted? BOOST_FIXED_STRING_ASSERT() and friends
are confusing to those familiar with Boost implementations.
to_fixed_string() has a constrained declaration, and static_asserts that
the constraint is satisfied inside the body. It seems that the
static_assert is just noise.
This appears quite wrong, even if Augustin is the one behind it. Did he
say why he expects that to work? It looks like it accepts doubles as
iterators, for instance.
// Because k-ballo said so
template<class T>
using is_input_iterator =
std::integral_constant
- What is your evaluation of the documentation?
Motivation: "A dynamically-resizable string is required within constexpr functions." -> "A dynamically-resizable string is required within constexpr functions (before C++20)." (since std::string is usable in a constexpr context in C+20 and later). Iterators: You forgot to note that increasing the length of the string will never invalidate iterators, since those operations never change the capacity. All of the links to source code in the synopses give me 404s. I feel like the documentation that exists, but defines exactly the same functionality as std::string (e.g. erase()), should be documented via reference to std::string. The fact that all these APIs exist within the fixed_string docs means I cannot easily find the variations from std::string. For instance, I know that any allocating function must throw if the capacity is exceeded (from the intro), but I don't know if there are other variations I need to know about without reading the API docs. With no indication where those variations might exist, I end up needing to read all the API descriptions, most of them identical to what I already know std::string to do, in order to find them. Please short-circuit this for me by only supplying the varying API docs, and saying "the rest is what std::string does, except that the capacity is fixed at N". Of course, if you change throws to asserts in the ways I suggested above, you'll need to note that as well. If you're going to replicate all the API docs from std::string, please don't leave out the type constraint documentation. For instance, fixed_string(T const& t, size_type pos, size_type n); has a constraint on "T" that you do not document. You may want to briefly state why it is insufficient just to use a std::string with a fixed-capacity allocator. I did not initially look a the definition of iterator and const_iterator, but once I did I was happy to see that they were both plain pointers. It's really useful to know that the iterator type is guaranteed to be a pointer, and calling that out explicitly would help users. - What is your evaluation of the potential usefulness of the library?
Very, based on the usefulness of boost::container::static_vector in contexts in which the maximum number of elements is known.
- Did you try to use the library? With what compiler? Did you have any problems?
I did not try to use it, simply because I know how std::string works. :)
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
I spent about 3 hours reading the documentation and implementation.
- Are you knowledgeable about the problem domain?
Yes. In particular, I have implemented multiple string-like types.
And finally, every review should answer this question: - Do you think the library should be accepted as a Boost library?
Yes, with these conditions: - the to_fixed_string() overloads are changed to match the std::to_string() overloads; - that detail::is_input_iterator is explained or fixed; and - compile-fail tests are added to verify that those functions with constraints are correctly constrained. Zach
On Wed, Nov 27, 2019 at 11:01 AM Peter Dimov via Boost < boost@lists.boost.org> wrote:
Zach Laine wrote:
Throwing when the capacity would be exceeded is wrong.
No, it's not. It's correct. It's a runtime error, not a programming time error.
Then why doesn't vector::operator[] throw? In either case, you know what the precondition is, and how to check it. That's what makes this a user error. With out-of-memory, I have no way to programmatically check that (within the standard). Zach
Zach Laine wrote:
On Wed, Nov 27, 2019 at 11:01 AM Peter Dimov via Boost
wrote: Zach Laine wrote:
Throwing when the capacity would be exceeded is wrong.
No, it's not. It's correct. It's a runtime error, not a programming time error.
Then why doesn't vector::operator[] throw? In either case, you know what the precondition is, and how to check it. That's what makes this a user error.
What makes op[] a user error is that the index rarely comes from external input. It typically comes from a variable that the programmer controls and program logic ensures is in range. Whereas when you += strings into a fixed capacity buffer, those are only occasionally programmer-supplied. Program logic is not violated if the input strings exceed the buffer. Or stated differently, if in 90% of the cases the correct use of a function would involve the programmer writing out the exact same if( !precondition ) throw exception(); thing that the function would have done itself were it throwing, the function should be throwing. Whereas when you do `for( int i = 0; i < n; ++i ) { something with v[i]; }`, you don't need to insert the above before each [].
On Wed, Nov 27, 2019 at 11:23 AM Peter Dimov
Zach Laine wrote:
On Wed, Nov 27, 2019 at 11:01 AM Peter Dimov via Boost
wrote: Zach Laine wrote:
Throwing when the capacity would be exceeded is wrong.
No, it's not. It's correct. It's a runtime error, not a programming time error.
Then why doesn't vector::operator[] throw? In either case, you know what the precondition is, and how to check it. That's what makes this a user error.
What makes op[] a user error is that the index rarely comes from external input. It typically comes from a variable that the programmer controls and program logic ensures is in range.
I disagree. What puts vector::operator+ inherently (which is directly analogous to fixed_string::operator+) more out of programmer control than vector::operator[]? Moreover, there are two things to consider: 1) Can I use this function within a context in which I know that the precondition check/throwing condition is not needed? This is in part an efficiency concern, of course. If I can check the condition myself, outside of the function, I can write code that maintains the invariant that the precondition is always true, or in which the precondition is true within some limited scope. I can then call the function within that invariant-maintaining code or scope, and elide the branches of the check. 2) If the check is needed, can the programmer write code that can sensibly deal with the problem? When I'm out of memory, I *may* be able to do something about that, depending on my system. For instance, running out of memory on a game console is not an unusual thing, and I'll have a failover strategy for that. Running out of array-based storage is not as easily recoverable. Unless the answers are 1) no and 2) yes, the function should not throw. Whereas when you += strings into a fixed capacity buffer, those are only
occasionally programmer-supplied. Program logic is not violated if the input strings exceed the buffer.
Ok, then do you advocate that fixed_string::operator+=(fixed_string) have preconditions and not throw, and that fixed_string::operator+=(char const *) should throw instead? Only the latter could have come from an end-user. End users have no way of supplying fixed_strings to your program.
Or stated differently, if in 90% of the cases the correct use of a function would involve the programmer writing out the exact same
if( !precondition ) throw exception();
thing that the function would have done itself were it throwing, the function should be throwing.
I can get behind that. I just don't think that I'd need to check-and-throw everywhere that I would use fixed_string::operator+=.
Whereas when you do `for( int i = 0; i < n; ++i ) { something with v[i]; }`, you don't need to insert the above before each [].
On this we agree. Zach
Zach Laine wrote:
What makes op[] a user error is that the index rarely comes from external input. It typically comes from a variable that the programmer controls and program logic ensures is in range.
I disagree. What puts vector::operator+ inherently (which is directly analogous to fixed_string::operator+) more out of programmer control than vector::operator[]?
Nothing inherently makes it so. It just is, in the observable universe.
Moreover, there are two things to consider:
1) Can I use this function within a context in which I know that the precondition check/throwing condition is not needed?
This is in part an efficiency concern, of course.
Even if this only occurs 0.1% of the time, the design should cater to it because of the mere possibility? The flip side is the loss of efficiency when you do have to check. You need to compute the size of the argument twice, which in the char const* case isn't free.
2) If the check is needed, can the programmer write code that can sensibly deal with the problem?
Sure, throw an exception. :-)
Ok, then do you advocate that fixed_string::operator+=(fixed_string) have preconditions and not throw, and that fixed_string::operator+=(char const *) should throw instead?
No. I would "precondition" `op+=( char const (&)[ M ] )` (and op=) by
failing at compile time when M > N, because the size is known, but in
fixed_string's case, the size isn't known.
There's no significant difference between accumulating the elements of
vector<string> and those of vector
On Thu, Nov 28, 2019 at 9:52 AM Peter Dimov via Boost
What makes op[] a user error is that the index rarely comes from external input. It typically comes from a variable that the programmer controls and program logic ensures is in range.
I disagree. What puts vector::operator+ inherently (which is directly analogous to fixed_string::operator+) more out of programmer control
Zach Laine wrote: than
vector::operator[]?
Nothing inherently makes it so. It just is, in the observable universe.
Moreover, there are two things to consider:
1) Can I use this function within a context in which I know that the precondition check/throwing condition is not needed?
This is in part an efficiency concern, of course.
Even if this only occurs 0.1% of the time, the design should cater to it because of the mere possibility?
The flip side is the loss of efficiency when you do have to check. You need to compute the size of the argument twice, which in the char const* case isn't free.
2) If the check is needed, can the programmer write code that can sensibly deal with the problem?
Sure, throw an exception. :-)
Ok, then do you advocate that fixed_string::operator+=(fixed_string) have preconditions and not throw, and that fixed_string::operator+=(char const *) should throw instead?
No. I would "precondition" `op+=( char const (&)[ M ] )` (and op=) by failing at compile time when M > N, because the size is known, but in fixed_string's case, the size isn't known.
There's no significant difference between accumulating the elements of vector<string> and those of vector
into a fixed_string. In both cases, we have no idea whether they'll fit or not. And yes, when they are known in advance, we can compute the final size in one loop, check, then += in a second loop, avoiding the check on each iteration. So there's a potential efficiency loss here, outweighed by gain of convenience and loss of buffer overflow exploits in the more common case where the accumulated strings come one by one from input.
Ok, I understand your point a bit better now I think. Is it the unboundedly-large nature of a NTBS that has you concerned? That is, do you think that op+=(char) should assert and op+=(char const *) should throw? That position makes sense to me, thought I don't share it -- though I think that's just taste. However, I cannot imagine why I'd ever want op+=(char) or op+=(string_view) to throw. Zach
On 2019-11-29 01:44, Zach Laine via Boost wrote:
Ok, I understand your point a bit better now I think. Is it the unboundedly-large nature of a NTBS that has you concerned? That is, do you think that op+=(char) should assert and op+=(char const *) should throw? That position makes sense to me, thought I don't share it -- though I think that's just taste. However, I cannot imagine why I'd ever want op+=(char) or op+=(string_view) to throw.
IIRC, append/insert/push_back will throw with any container or std::string, if the result exceeds max_size(). It is only natural if fixed_string does the same, it is the expected behavior. You can argue for a separate set of unsafe insert methods that don't perform the check and exhibit undefined behavior if the resulting length exceeds the limit. But it should be very clear that the user must ensure himself that it doesn't happen.
I guess, this is a matter of design choice: do we want a `resize()` beyond
`capacity()` to be treated as a programmer bug (something that the library
does not offer) or a correct -- even if very occasional -- usage. Both have
their justification and both bring value; but making either choice renders
a different library, even if similar on the face value. The model with
throwing exceptions is the "infinite memory model": programmer can assume
that capacity is infinite and accept that at some point an exception will
remind her that the assumption could no longer be maintained. The UB model
makes the capacity part of the contract, and programmer takes
responsibility of maintaining it.
If all STL containers take the exception model, and this library aims at
being a drop-in replacement for std::string, the it would sound more
reasonable to adapt the "infinite capacity" model.
On the other hand, if the goal is to provide a container suitable for
embedded operations, where you cannot afford throwing exceptions the same
way as you cannot afford heap allocation, one would expect the other model.
This is what static_vector has chosen.
It looks to me that these two design goals cannot be pursued together:
* be a drop-in replacement for std::string
* be a tool for embedded systems (where resource/constraint awareness
changes the way one organize programs)
As they necessitate opposite design trade-offs. Authors of fixed_string
should make the call which goal they want to pursue and document it. And
from this choice the design trade-offs will follow.
static_string came from Beast. My guess is that Beast needs the "infinite
memory" model. If you are doing networking, you can afford occasional
memory allocation required by exceptions. And you want to favor buffer
overflow protection over bleeding edge performance.
This also justifies using different prefix ("fixed_") than in
static_vector, as the latter library chose a different model.
Regards,
&rzej;
pt., 29 lis 2019 o 01:18 Andrey Semashev via Boost
On 2019-11-29 01:44, Zach Laine via Boost wrote:
Ok, I understand your point a bit better now I think. Is it the unboundedly-large nature of a NTBS that has you concerned? That is, do
think that op+=(char) should assert and op+=(char const *) should throw? That position makes sense to me, thought I don't share it -- though I
you think
that's just taste. However, I cannot imagine why I'd ever want op+=(char) or op+=(string_view) to throw.
IIRC, append/insert/push_back will throw with any container or std::string, if the result exceeds max_size(). It is only natural if fixed_string does the same, it is the expected behavior.
You can argue for a separate set of unsafe insert methods that don't perform the check and exhibit undefined behavior if the resulting length exceeds the limit. But it should be very clear that the user must ensure himself that it doesn't happen.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Andrzej Krzemienski wrote:
On the other hand, if the goal is to provide a container suitable for embedded operations, where you cannot afford throwing exceptions the same way as you cannot afford heap allocation, one would expect the other model.
On embedded, you compile with -fno-exceptions and add namespace boost { void throw_exception( std::exception const& ) { __builtin_trap(); } } There's no need to afford exceptions, you just need to be able to afford the potential loss in performance, basically one branch per append if not heroically optimized away.
I suspect (I am not programming in an embedded system) that rather than
relying on __builtin_trap() or std::abort(), what you do is start treating
a `resize()` over `capacity()` as a precondition violation (a bug). And
this causes a different programming model and the organization of your code.
Regards,
&rzej;
pt., 29 lis 2019 o 09:06 Peter Dimov via Boost
Andrzej Krzemienski wrote:
On the other hand, if the goal is to provide a container suitable for embedded operations, where you cannot afford throwing exceptions the same way as you cannot afford heap allocation, one would expect the other model.
On embedded, you compile with -fno-exceptions and add
namespace boost { void throw_exception( std::exception const& ) { __builtin_trap(); } }
There's no need to afford exceptions, you just need to be able to afford the potential loss in performance, basically one branch per append if not heroically optimized away.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Andrzej Krzemienski wrote:
I suspect (I am not programming in an embedded system) that rather than relying on __builtin_trap() or std::abort(), what you do is start treating a `resize()` over `capacity()` as a precondition violation (a bug). And this causes a different programming model and the organization of your code.
That was exactly why I wrote
There's no need to afford exceptions, you just need to be able to afford the potential loss in performance, basically one branch per append if not heroically optimized away.
That is: in order to use the library as presented, you don't need to afford exceptions, but you will need to basically duplicate the same check op+= does, in user code; this may decrease performance, but will not allow a buffer overflow in case you either forgot the check or got it wrong.
pt., 29 lis 2019 o 10:10 Peter Dimov via Boost
Andrzej Krzemienski wrote:
I suspect (I am not programming in an embedded system) that rather than relying on __builtin_trap() or std::abort(), what you do is start treating a `resize()` over `capacity()` as a precondition violation (a bug). And this causes a different programming model and the organization of your code.
That was exactly why I wrote
There's no need to afford exceptions, you just need to be able to afford the potential loss in performance, basically one branch per append if not heroically optimized away.
That is: in order to use the library as presented, you don't need to afford exceptions, but you will need to basically duplicate the same check op+= does, in user code; this may decrease performance, but will not allow a buffer overflow in case you either forgot the check or got it wrong.
This is just performance aspect. Another one is correctness checking. If a library adapted the model where over-resize is a bug, it can plant sufficient platform-specific precondition annotations for the tools to be able to detect user bugs. (e.g., when I put __builtin_unreachable() and compile with UB-sanitizer). Regards, &rzej;
On Fri, Nov 29, 2019 at 3:10 AM Peter Dimov via Boost
Andrzej Krzemienski wrote:
I suspect (I am not programming in an embedded system) that rather than relying on __builtin_trap() or std::abort(), what you do is start treating a `resize()` over `capacity()` as a precondition violation (a bug). And this causes a different programming model and the organization of your code.
That was exactly why I wrote
There's no need to afford exceptions, you just need to be able to afford the potential loss in performance, basically one branch per append if not heroically optimized away.
That is: in order to use the library as presented, you don't need to afford exceptions, but you will need to basically duplicate the same check op+= does, in user code; this may decrease performance, but will not allow a buffer overflow in case you either forgot the check or got it wrong.
This leaves out the case where you know from context outside of fixed_string that an append is guaranteed to work. This is an important case, IMO, and one for which I should not be required to suffer the perf overhead when I don't need the check. Also, if the API defined preconditions and invoked UB when misused, the check would only ever need to happen once. As you say, this would have to be done in user code -- as with all precondtioned functions, like back(), operator[](), etc. That is, the user's need to verify that the operation will succeed would not be unique to the append/insert operations. Zach
On Fri, Nov 29, 2019 at 2:39 AM Andrzej Krzemienski via Boost < boost@lists.boost.org> wrote:
I suspect (I am not programming in an embedded system) that rather than relying on __builtin_trap() or std::abort(), what you do is start treating a `resize()` over `capacity()` as a precondition violation (a bug). And this causes a different programming model and the organization of your code.
Yes, this is exactly my claim as well. Zach
On Thu, Nov 28, 2019 at 6:18 PM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
On 2019-11-29 01:44, Zach Laine via Boost wrote:
Ok, I understand your point a bit better now I think. Is it the unboundedly-large nature of a NTBS that has you concerned? That is, do
think that op+=(char) should assert and op+=(char const *) should throw? That position makes sense to me, thought I don't share it -- though I
you think
that's just taste. However, I cannot imagine why I'd ever want op+=(char) or op+=(string_view) to throw.
IIRC, append/insert/push_back will throw with any container or std::string, if the result exceeds max_size(). It is only natural if fixed_string does the same, it is the expected behavior.
You can argue for a separate set of unsafe insert methods that don't perform the check and exhibit undefined behavior if the resulting length exceeds the limit. But it should be very clear that the user must ensure himself that it doesn't happen.
I don't find it natural that a type whose size is statically known and a type whose size is not statically known have the same relationship to user code (preconditions, use of UB vs. exceptions, etc.). Zach
Zach Laine wrote:
I don't find it natural that a type whose size is statically known and a type whose size is not statically known have the same relationship to user code (preconditions, use of UB vs. exceptions, etc.).
But the size of the proposed fixed_string isn't statically known. Its capacity is.
On Mon, Dec 2, 2019 at 11:48 AM Peter Dimov via Boost
Zach Laine wrote:
I don't find it natural that a type whose size is statically known and a type whose size is not statically known have the same relationship to user code (preconditions, use of UB vs. exceptions, etc.).
But the size of the proposed fixed_string isn't statically known. Its capacity is.
Right, thanks. I wrote "size" when I meant "capacity". I think the point remains the same, however. I know statically how much room I have in which to operate, as opposed to having a runtime-variable amount of room. Zach
On 3/12/2019 07:02, Zach Laine wrote:
On Mon, Dec 2, 2019 at 11:48 AM Peter Dimov wrote:
Zach Laine wrote:
I don't find it natural that a type whose size is statically known and a type whose size is not statically known have the same relationship to user code (preconditions, use of UB vs. exceptions, etc.).
But the size of the proposed fixed_string isn't statically known. Its capacity is.
Right, thanks. I wrote "size" when I meant "capacity". I think the point remains the same, however. I know statically how much room I have in which to operate, as opposed to having a runtime-variable amount of room.
But given fixed_string<N> and fixed_string<M>, there is no way to
determine at compile time whether direct concatenation is safe or will
overflow, even presuming you are appending to the larger of the two.
The only way to guarantee at compile time that a concatenation is not at
risk of overflow is to first copy to a fixed_string
On Mon, Dec 2, 2019 at 5:42 PM Gavin Lambert via Boost < boost@lists.boost.org> wrote:
On Mon, Dec 2, 2019 at 11:48 AM Peter Dimov wrote:
Zach Laine wrote:
I don't find it natural that a type whose size is statically known and a type whose size is not statically known have the same relationship to user code (preconditions, use of UB vs. exceptions, etc.).
But the size of the proposed fixed_string isn't statically known. Its capacity is.
Right, thanks. I wrote "size" when I meant "capacity". I think the
On 3/12/2019 07:02, Zach Laine wrote: point
remains the same, however. I know statically how much room I have in which to operate, as opposed to having a runtime-variable amount of room.
But given fixed_string<N> and fixed_string<M>, there is no way to determine at compile time whether direct concatenation is safe or will overflow, even presuming you are appending to the larger of the two.
The only way to guarantee at compile time that a concatenation is not at risk of overflow is to first copy to a fixed_string
and then append, which (in the primary intended use case of allocating a large buffer up-front and not using most of it) is a massive waste of both performance and memory.
All true, but I'm unclear why you're pointing this out. I never suggested otherwise.
Thus (assuming we're appending into one of the existing buffers instead), *someone* needs to do a range check at runtime, and forgetting it is a serious security vulnerability. If we lived in a world where sanitisers were readily available, reliable, and automatically run on all code, then maybe you can argue that this is the user's responsibility and UB is "safe". We still do not yet live in that world.
I get the point you're making. What I do not get is why you *only* apply it to op+. That is, if memory is so essential for op+, then why is it not also essential for op[] . This is not a troll. I really do want to know what the difference is as you see it. Zach
On Tue, Dec 3, 2019 at 7:42 AM Zach Laine via Boost
Thus (assuming we're appending into one of the existing buffers instead), *someone* needs to do a range check at runtime, and forgetting it is a serious security vulnerability. If we lived in a world where sanitisers were readily available, reliable, and automatically run on all code, then maybe you can argue that this is the user's responsibility and UB is "safe". We still do not yet live in that world.
I get the point you're making. What I do not get is why you *only* apply it to op+. That is, if memory is so essential for op+, then why is it not also essential for op[] . This is not a troll. I really do want to know what the difference is as you see it.
These are two different kinds of errors, invalid access vs buffer overrun. I trust you're not arguing that these are essentially the same. Logically, using a bad index can not be recovered from. You have broken invariants, or worse. It's the textbook example of what not to do. In contrast, when running out of a resource, nothing bad has yet happened. The program is in good state, failing to establish a new invariant. Generally, programs should be able to recover from such state. It's the textbook example for good practices: you check for errors and recover from them, rather than end up with UB.
On Tue, Dec 3, 2019 at 1:08 PM Emil Dotchevski via Boost < boost@lists.boost.org> wrote:
On Tue, Dec 3, 2019 at 7:42 AM Zach Laine via Boost
wrote:
Thus (assuming we're appending into one of the existing buffers instead), *someone* needs to do a range check at runtime, and forgetting it is a serious security vulnerability. If we lived in a world where sanitisers were readily available, reliable, and automatically run on all code, then maybe you can argue that this is the user's responsibility and UB is "safe". We still do not yet live in that world.
I get the point you're making. What I do not get is why you *only* apply it to op+. That is, if memory is so essential for op+, then why is it not also essential for op[] . This is not a troll. I really do want to know what the difference is as you see it.
These are two different kinds of errors, invalid access vs buffer overrun. I trust you're not arguing that these are essentially the same.
Logically, using a bad index can not be recovered from. You have broken invariants, or worse. It's the textbook example of what not to do.
In contrast, when running out of a resource, nothing bad has yet happened. The program is in good state, failing to establish a new invariant. Generally, programs should be able to recover from such state. It's the textbook example for good practices: you check for errors and recover from them, rather than end up with UB.
I buy that in the abstract. For instance, I have no problem with throwing on allocation failure. Where we disagree seems to be whether char[N] is a resource. I claim that it is not. Zach
On Tue, Dec 3, 2019 at 11:17 AM Zach Laine via Boost
On Tue, Dec 3, 2019 at 1:08 PM Emil Dotchevski via Boost < boost@lists.boost.org> wrote:
On Tue, Dec 3, 2019 at 7:42 AM Zach Laine via Boost < boost@lists.boost.org
wrote:
Thus (assuming we're appending into one of the existing buffers instead), *someone* needs to do a range check at runtime, and forgetting it is a serious security vulnerability. If we lived in a world where sanitisers were readily available, reliable, and automatically run on all code, then maybe you can argue that this is the user's responsibility and UB is "safe". We still do not yet live in that world.
I get the point you're making. What I do not get is why you *only* apply it to op+. That is, if memory is so essential for op+, then why is it not also essential for op[] . This is not a troll. I really do want to know what the difference is as you see it.
These are two different kinds of errors, invalid access vs buffer overrun. I trust you're not arguing that these are essentially the same.
Logically, using a bad index can not be recovered from. You have broken invariants, or worse. It's the textbook example of what not to do.
In contrast, when running out of a resource, nothing bad has yet happened. The program is in good state, failing to establish a new invariant. Generally, programs should be able to recover from such state. It's the textbook example for good practices: you check for errors and recover from them, rather than end up with UB.
I buy that in the abstract. For instance, I have no problem with throwing on allocation failure. Where we disagree seems to be whether char[N] is a resource. I claim that it is not.
I thought you're arguing that bad indexing and buffer overrun should be treated the same. I explained why not.
Zach Laine wrote:
Ok, I understand your point a bit better now I think. Is it the unboundedly-large nature of a NTBS that has you concerned? That is, do you think that op+=(char) should assert and op+=(char const *) should throw?
No, I prefer all of them consistently throwing, even when the check is O(1). My primary concern is not the loss of efficiency due to double checking, it's to avoid buffer overflow exploits. Yes, this will be less efficient if you += characters (but not by so much as one might think), and yes, you can write code that will be correct if op+= doesn't check. (Since operating on chars destroys optimizations due to aliasing, the inner loop improves from .L5: add rax, 1 movzx ecx, BYTE PTR [rdx] cmp rax, 511 ja .L11 add rdx, 1 mov QWORD PTR [rdi], rax mov BYTE PTR [rdi+7+rax], cl cmp rsi, rdx jne .L5 https://godbolt.org/z/Z37Pmx to .L3: movzx ecx, BYTE PTR [rdx+rax] add rax, 1 mov QWORD PTR [rdi], rax mov BYTE PTR [rdi+7+rax], cl cmp r8, rax jne .L3 https://godbolt.org/z/CXHRKQ which is better, but far from optimal.) I were implementing this class, I would always check in op+= (and append etc) even if the specification says "Expects" instead of "Throws", I'd just __builtin_trap instead of throwing. Otherwise, this is strcat all over again, and there's a reason there was a big concerted push against this style in/by Microsoft and elsewhere.
On Fri, Nov 29, 2019 at 1:15 AM Peter Dimov via Boost
Zach Laine wrote:
Ok, I understand your point a bit better now I think. Is it the unboundedly-large nature of a NTBS that has you concerned? That is, do you think that op+=(char) should assert and op+=(char const *) should throw?
No, I prefer all of them consistently throwing, even when the check is O(1). My primary concern is not the loss of efficiency due to double checking, it's to avoid buffer overflow exploits.
Ok, so the issue as you see it is that a program that is memory unsafe should die in release builds rather than allowing an attacker to take over, is that right? If so, I understand the motivation, but I don't share it -- I literally do not ever need to worry about such things in my work. Should everyone pay the checking cost, because this is a concern for some code (even if it is a common concern -- I'm not minimizing it more than to say that it's <100% of uses). Moreover, you mention using __builtin_trap below, which gives you a quick death instead of acting as an attack vector, but the proposed design does not. Again I ask, if we throw in this case, what do I write in catching code to remediate the situation that I need to stuff 10lbs of bits into a 5lb sack?
Yes, this will be less efficient if you += characters (but not by so much as one might think), and yes, you can write code that will be correct if op+= doesn't check. (Since operating on chars destroys optimizations due to aliasing, the inner loop improves from
.L5: add rax, 1 movzx ecx, BYTE PTR [rdx] cmp rax, 511 ja .L11 add rdx, 1 mov QWORD PTR [rdi], rax mov BYTE PTR [rdi+7+rax], cl cmp rsi, rdx jne .L5
to
.L3: movzx ecx, BYTE PTR [rdx+rax] add rax, 1 mov QWORD PTR [rdi], rax mov BYTE PTR [rdi+7+rax], cl cmp r8, rax jne .L3
which is better, but far from optimal.)
Ok, I tried this myself on Quickbench: http://quick-bench.com/WZPNHFBKI-hFxThBYRZVjndtBho I see a slowdown factor of 1.39 when throwing if a single newly appended character (the char op+= case) does not fit, vs. UB. UB is our friend here. It would be a shame if this code: std::ranges::copy(r, std::back_inserter(str)); (for some fixed_string str) were 1.39x slower, if I know that r is smaller than str's capacity, and str is empty.
I were implementing this class, I would always check in op+= (and append etc) even if the specification says "Expects" instead of "Throws", I'd just __builtin_trap instead of throwing. Otherwise, this is strcat all over again, and there's a reason there was a big concerted push against this style in/by Microsoft and elsewhere.
Zach
Zach Laine wrote:
Ok, I tried this myself on Quickbench:
I tried to fix that to add the null terminator, but as a result both push_backs were entirely optimized away. http://quick-bench.com/DtPR9Yov1TSzUFeYRx0ipVvUYBk The Append case is how I think this code should be written. Character by character loops are only useful as a worst-case estimate; if there's a performance problem there, you'd rewrite the loop to not push_back characters one by one, not try to make it go faster.
Zach Laine wrote:
Ok, so the issue as you see it is that a program that is memory unsafe should die in release builds rather than allowing an attacker to take over, is that right?
Yes, this is known as converting a remote execution exploit into a denial of service exploit. Not great, not terrible.
If so, I understand the motivation, but I don't share it -- I literally do not ever need to worry about such things in my work. Should everyone pay the checking cost, because this is a concern for some code (even if it is a common concern -- I'm not minimizing it more than to say that it's <100% of uses).
It's a very, very common concern; it was actually the #1 security issue (and maybe still is.) As I said, this was the primary motivation behind Microsoft's security initiative with their _s functions, and they took this exact same approach - all strcpy-like functions check for buffer overrun and terminate. This is a practically unquestioned security best practice.
Moreover, you mention using __builtin_trap below, which gives you a quick death instead of acting as an attack vector, but the proposed design does not. Again I ask, if we throw in this case, what do I write in catching code to remediate the situation that I need to stuff 10lbs of bits into a 5lb sack?
It's hard to answer this without context. The simplest possible case would be void append_something( char const * p, std::size_t n, error_code & ec ) noexcept { if( buffer_.size() + n > buffer_.max_size() ) { ec = make_error_code( errc::invalid_argument ); return; } buffer_.append( p, n ); ec = {}; } This goes straight to terminate regardless of whether append aborts or throws, because of the noexcept. And yes, this can happen, because I have a bug in my size check, as 99% of all people would. In a system that uses exceptions instead of error codes, the functions won't be noexcept, but then you'd just let the length_error bubble up, as with any other failure. If you for instance try to serialize a 10lb json into a fixed_string<5lb>, the serialization will fail with an exception and you'll need to figure out what to do with that failure, as with any other. This is no different from giving a fixed storage allocator to some std::vector (or to the JSON deserializer, if you will.)
On Wed, Nov 27, 2019 at 11:57 AM Zach Laine via Boost
Could you explain why the fixed_string API is not constexpr? Is it for C++11 compatability? Could you also explain how this meets the constexpr use case goal?
The library support down to C++11 when using boost, or C++17 at minimum in standalone mode. As for why the lack of constexpr, it simply can be done later.
Would it be unreasonable to add op+ back in for operands of the same type? There's no difficulty determining the size of fixed_string<8>("foo") + fixed_string<8>("bar"), right? That's no different from fixed_string<8>("foo") + "bar", which *is* supported.
There is no difficulty in determining the size, but there is the problem that the size of the resulting string will be irrespective of the number of characters contained within, which is not a problem for a string literal.
On Fri, Nov 29, 2019 at 10:50 AM Krystian Stasiowski via Boost
...why the lack of constexpr, it simply can be done later.
In light of the use-cases brought up during review I think having the noexcept and constexpr sooner rather than later is the best course of action. Thanks
On Wed, Nov 27, 2019 at 11:56 AM Zach Laine via Boost-users < boost-users@lists.boost.org> wrote:
This appears quite wrong, even if Augustin is the one behind it. Did he say why he expects that to work? It looks like it accepts doubles as iterators, for instance.
// Because k-ballo said so template<class T> using is_input_iterator = std::integral_constant
;
This is presumably trying to follow [container.requirements.general]p17: The extent to which an implementation determines that a type cannot be an
input iterator is unspecified, except that as a minimum integral types shall not qualify as input iterators.
On Fri, Nov 29, 2019 at 4:06 PM Tim Song via Boost
The extent to which an implementation determines that a type cannot be an
input iterator is unspecified, except that as a minimum integral types shall not qualify as input iterators.
Wow, nice bit of archeological work there! Agustin right again, as usual. Thanks
On Fri, Nov 29, 2019 at 6:15 PM Vinnie Falco via Boost < boost@lists.boost.org> wrote:
On Fri, Nov 29, 2019 at 4:06 PM Tim Song via Boost
wrote: The extent to which an implementation determines that a type cannot be an
input iterator is unspecified, except that as a minimum integral types shall not qualify as input iterators.
Wow, nice bit of archeological work there! Agustin right again, as usual.
I don't think that's the take-away. "unspecified" does not mean that a good implementation only checks that the type is not an integral type. It just means that they don't document whatever they do, as they would with "implementation-defined". It also does not mean that a Boost library (which is not an implementation) should accept "double" as an iterator! Zach
On Mon, Dec 2, 2019 at 11:59 AM Zach Laine via Boost
On Fri, Nov 29, 2019 at 6:15 PM Vinnie Falco via Boost < boost@lists.boost.org> wrote:
On Fri, Nov 29, 2019 at 4:06 PM Tim Song via Boost
wrote: The extent to which an implementation determines that a type cannot be an
input iterator is unspecified, except that as a minimum integral types shall not qualify as input iterators.
Wow, nice bit of archeological work there! Agustin right again, as usual.
I don't think that's the take-away. "unspecified" does not mean that a good implementation only checks that the type is not an integral type. It just means that they don't document whatever they do, as they would with "implementation-defined". It also does not mean that a Boost library (which is not an implementation) should accept "double" as an iterator!
Zach
This specification is only used to disambiguate (size_t, value_type) and (Iter, Iter) overloads. Having (double, double) go to the latter and explode at compile time might not be a bad thing.
I wonder if it is possible to implement op+ such that if one writes: x = a+b; the return value of a+b gets converted to the type of x. The motivation is that, as a user, I've determined a fixed size limit for x, and therefore the library should use that, and if it is exceeded at run-time, it makes sense for the behavior to be the same as when any other fixed_string capacity is exceeded (e.g. boost::throw_exception).
On 3/12/2019 15:46, Emil Dotchevski wrote:
I wonder if it is possible to implement op+ such that if one writes:
x = a+b;
the return value of a+b gets converted to the type of x. The motivation is that, as a user, I've determined a fixed size limit for x, and therefore the library should use that, and if it is exceeded at run-time, it makes sense for the behavior to be the same as when any other fixed_string capacity is exceeded (e.g. boost::throw_exception).
It is possible, by using a deferred execution pattern. operator+ would return a unique type which contains the references to a and b (not yet concatenated) and implements a templated implicit cast operator for fixed_string<N>, which is where the concatenation is actually performed (since it then knows the target type). This also works when passing parameters -- at least when those parameters are declared with an explicit fixed_string<N> type. The caveat is that it doesn't work with "auto" variables, or with templated parameters. This limitation unfortunately is quite a large one, and can be quite aggravating in practice. (Another caveat is that those cases that "don't work" above can easily cause dangling references and other lifetime issues instead of cleanly failing to compile.) Given that, I don't think I'd recommend actually doing it. The reference-lifetime-safe alternative is to return an object that contains a copy of the parameters, but this is now strictly worse than just returning a fixed_string. (And, when you then try to assign this to a fixed_string<N>, reintroduces the problem of what to do when a.size()+b.size() > N, since it will probably often be the case that A+B
N, and this is not a mistake.)
Right, that's what I meant. Dangers, issues with auto aside, I can't think of a use case when I would want op+ to return a string of size M+N. Keep in mind, op+ composes. Leaving it undefined is an option, I suppose. On Mon, Dec 2, 2019 at 7:59 PM Gavin Lambert via Boost < boost@lists.boost.org> wrote:
On 3/12/2019 15:46, Emil Dotchevski wrote:
I wonder if it is possible to implement op+ such that if one writes:
x = a+b;
the return value of a+b gets converted to the type of x. The motivation is that, as a user, I've determined a fixed size limit for x, and therefore the library should use that, and if it is exceeded at run-time, it makes sense for the behavior to be the same as when any other fixed_string capacity is exceeded (e.g. boost::throw_exception).
It is possible, by using a deferred execution pattern.
operator+ would return a unique type which contains the references to a and b (not yet concatenated) and implements a templated implicit cast operator for fixed_string<N>, which is where the concatenation is actually performed (since it then knows the target type).
This also works when passing parameters -- at least when those parameters are declared with an explicit fixed_string<N> type.
The caveat is that it doesn't work with "auto" variables, or with templated parameters. This limitation unfortunately is quite a large one, and can be quite aggravating in practice.
(Another caveat is that those cases that "don't work" above can easily cause dangling references and other lifetime issues instead of cleanly failing to compile.)
Given that, I don't think I'd recommend actually doing it.
The reference-lifetime-safe alternative is to return an object that contains a copy of the parameters, but this is now strictly worse than just returning a fixed_string. (And, when you then try to assign this to a fixed_string<N>, reintroduces the problem of what to do when a.size()+b.size() > N, since it will probably often be the case that A+B
N, and this is not a mistake.)
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
...
There seems to be a lot of disagreement on handling overflow so I'd like to remind everyone of why this library is being proposed: A Beast user asked "Why isn't beast::static_string in a separate Boost library? It is quite useful on its own." They did not ask to have the treatment of overflow changed, to have the sizeof the class optimized for various N, or to have any of the API changed at all. Make of this what you will. Regards
On December 3, 2019 2:31:59 PM EST, Vinnie Falco via Boost
...
There seems to be a lot of disagreement on handling overflow so I'd like to remind everyone of why this library is being proposed:
A Beast user asked "Why isn't beast::static_string in a separate Boost library? It is quite useful on its own."
They did not ask to have the treatment of overflow changed, to have the sizeof the class optimized for various N, or to have any of the API changed at all. Make of this what you will.
One user's (possibly unconsidered) thoughts on the design are not binding on Boost. You surely know that what is happening is the essence of the Boost review process. If you're suggesting that you're unwilling to make changes of the sorts being discussed, you should alert everyone to this being a take-it-or-leave-it review now. -- Rob (Sent from my portable computation device.)
On Wed, Dec 4, 2019 at 2:09 AM Rob Stewart via Boost
You surely know that what is happening is the essence of the Boost review process.
LOL ain't that the truth!!!
If you're suggesting that you're unwilling to make changes of the sorts being discussed....
What I'm saying, which is not specific to this particular review, is that in the heat of the moment when we are vigorously debating technical points that we should keep the users in mind. Thanks
participants (11)
-
Andrey Semashev
-
Andrzej Krzemienski
-
Emil Dotchevski
-
Gavin Lambert
-
Joaquin M López Muñoz
-
Krystian Stasiowski
-
Peter Dimov
-
Rob Stewart
-
Tim Song
-
Vinnie Falco
-
Zach Laine