[review][fixed_string] Andrzej's feedback
Hi Everyone, This can hardly be called a review, as I didn't look thoroughly at the implementation. (I only checked what happens on the resize() over the capacity(), as this was not clear from the docs.) I think that ultimately the library would be a useful addition to Boost, however I would not like to rush it before its design is fully baked, and my impression is that it isn't yet. Therefore my recommendation would be to REJECT the library at this point, and encourage the authors to resubmit it for the second review once the design goals and design decisions have been clearly settled and communicated. My vision of a review is that we: 1. First evaluate the design goals, whether they are worth pursuing 2. Then we evaluate the design *against* those goals. 3. Then we evaluate the documentation and implementation *against* the design and goals. I cannot even go to #3, as to me #1 is missing some parts. There are different possible goals: 1. Treat this new string as many other STL sequence containers with variable size. In particular, users use it as if its max_size() were infinite, and are prepared that at some point an exception might be thrown. 2. Treat it as a "drop-in replacement" for std::string with no-heap-alloc guarantee. 3. Make something suitable for embedded systems where the user provides the guarantee: "I will never make the size() of this string go above this value. If I did, it would be plain clear that the program is not doing what it was supposed to." These goal 3 is in clear conflict with either 1 or 2. This becomes clear when we consider what should be the response to doing a resize() over capacity(): a precondition violation or a normal course of action signaled by an exception. The authors should make a choice which of these goals is not supported and communicate it clearly from the outset in the docs. I think this decision has already been made: the library was extracted from Boost.Beast and there the purpose must have been clear. It is just that it is not communicated in fixed_string library. Any decision is good. In C++ performance trade-offs are part of the contract (interface), so there is room in Boost for many strings: small strings, short codes, fixed size strings, fixed capacity strings. But for each library it should be clear what trade-offs were chosen and why. We have a situation that there exists static_vactor in Boost, which defines the resize() over capacity() to be a precondition violation. And fixed_string needs to provide some response to this fact: why did you choose a different design? I consider the choice of name to be part of the design: you cannot assign the name before you have decided on the goals. "static_string" might be a good name if you chose the same trade-offs as static_vector. But if you make different trade-offs, it might be better to use a different name. Making a resize() over capacity() a precondition violation is a *feature* useful for bug detection and I do not consider it a valid argument that "library will throw exceptions and if you never resize() over capacity() you will never see exceptions or std::abort()". If this is a precondition, then I expect of a library to put some BOOST_ASSERT() or _builtin_unreachable() in those paths to enable better bug detection. I find the goal "a drop-in replacement for std::string" invalid. You surely mean only a drop in replacement for some subset of usages. I guess this subset comes from the usage in Beast, and I would expect that it is clearly outlined what this subset is. One of the operations on strings that I use most often is operator+, so when I hear that this operation is not implemented in a class that claims to be a drop-in replacement for string, I see a contradiction. There might be valid reasons not to provide operator+, but the documentation cannot state that this is a drop-in replacement for std::string. I think that only after the above issues have been sorted out can we have a review of the implementation and the documentation. One remark about documentation is that I was not able to find what `resize()` over `capacity()` does. This is the most important information that I was after. anything else is more less intuitively clear from the analogy to std::string. I hope that this feedback doesn't sound discouraging. I appreciate the work Krystian and Vinnie have put into making this submission, and I would like to see this library ultimately included in Boost. Regards, &rzej;
On Wed, Dec 4, 2019 at 6:14 AM Andrzej Krzemienski via Boost
I hope that this feedback doesn't sound discouraging. I appreciate the work Krystian and Vinnie have put into making this submission, and I would like to see this library ultimately included in Boost.
Not discouraging at all, and I agree with everything. Thanks
On Wed, Dec 4, 2019 at 6:14 AM Andrzej Krzemienski via Boost < boost@lists.boost.org> wrote:
These goal 3 is in clear conflict with either 1 or 2. This becomes clear when we consider what should be the response to doing a resize() over capacity(): a precondition violation or a normal course of action signaled by an exception.
The severity of buffer overrun bugs should be considered in this case, as well as the overhead of checking and calling boost::throw_exception. Under -fno-exceptions, this is literally a single cmp instruction. I can't imagine a C++ program where this would matter, but if one exists, I can't imagine it would be using a universal fixed_string type. The correct design is to check and call boost::throw_exception, rather than assert.
śr., 4 gru 2019 o 23:13 Emil Dotchevski via Boost
On Wed, Dec 4, 2019 at 6:14 AM Andrzej Krzemienski via Boost < boost@lists.boost.org> wrote:
These goal 3 is in clear conflict with either 1 or 2. This becomes clear when we consider what should be the response to doing a resize() over capacity(): a precondition violation or a normal course of action signaled by an exception.
The severity of buffer overrun bugs should be considered in this case, as well as the overhead of checking and calling boost::throw_exception. Under -fno-exceptions, this is literally a single cmp instruction. I can't imagine a C++ program where this would matter, but if one exists, I can't imagine it would be using a universal fixed_string type. The correct design is to check and call boost::throw_exception, rather than assert.
I tend to agree; especially that the library originated from Beast where it may be even facing the outside world directly. What I would expect of documentation to state it clearly in the design goals section: 1. Overrunning capacity() is not a bug but a normal usage of the library. In this case we guarantee that an exception is thrown, much like std::string does. 2. This design decision is opposite of what `static_vector` chose, so do not be misled by an apparent similarity. Note that the motivation -- in the alternate design (which neither of us wants for fixed_string) -- for having a precondition violation rather than guaranteed throw is safety: not performance. Regards, &rzej;
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 5/12/2019 03:14, Andrzej Krzemienski wrote:
Making a resize() over capacity() a precondition violation is a *feature* useful for bug detection and I do not consider it a valid argument that "library will throw exceptions and if you never resize() over capacity() you will never see exceptions or std::abort()". If this is a precondition, then I expect of a library to put some BOOST_ASSERT() or _builtin_unreachable() in those paths to enable better bug detection.
I see this argument a lot, and it confuses me. Perhaps this is my Windows dev background talking (since the analysis tools seem more lacking on Windows, despite having a better debugger), but in my experience it is vastly easier to find a thrown exception than to find "deliberate" UB (including asserts). And vastly easier to log that it unexpectedly occurred in production code in the field, so that you can detect and fix it without a debugger attached to the process. Asserts and unreachables both disappear in release builds, so the process ends up continuing to run in some subtly corrupted way -- if you're lucky it crashes soon after in an unrelated location that takes you weeks to track down the true cause. If you're unlucky, it runs longer, and corrupted some customer data along the way. Please enlighten me.
Gavin Lambert wrote:
Asserts and unreachables both disappear in release builds, ...
Unreachables don't just disappear, they take parts of the code with them. Misguided ideas like "let's use __builtin_unreachable for the preconditions" are exactly why I prefer the behavior on critical precondition violations specified, rather than undefined, even though undefined might theoretically be better.
śr., 4 gru 2019 o 23:47 Gavin Lambert via Boost
Making a resize() over capacity() a precondition violation is a *feature* useful for bug detection and I do not consider it a valid argument that "library will throw exceptions and if you never resize() over capacity() you will never see exceptions or std::abort()". If this is a
On 5/12/2019 03:14, Andrzej Krzemienski wrote: precondition,
then I expect of a library to put some BOOST_ASSERT() or _builtin_unreachable() in those paths to enable better bug detection.
I see this argument a lot, and it confuses me.
Perhaps this is my Windows dev background talking (since the analysis tools seem more lacking on Windows, despite having a better debugger), but in my experience it is vastly easier to find a thrown exception than to find "deliberate" UB (including asserts). And vastly easier to log that it unexpectedly occurred in production code in the field, so that you can detect and fix it without a debugger attached to the process.
Asserts and unreachables both disappear in release builds, so the process ends up continuing to run in some subtly corrupted way -- if you're lucky it crashes soon after in an unrelated location that takes you weeks to track down the true cause. If you're unlucky, it runs longer, and corrupted some customer data along the way.
Please enlighten me.
If the library design were to treat overrunning the capacity as a programmer bug (which I am not advocating for for fixed_string) then I imagine the library would use a dedicated macro to indicate the precondition: ``` void fixed_string<N>::resize(size_type s) { BOOST_FIXED_STRING_PRECONDITION(s <= this->capacity()); // then do the job } ``` And you leave the decision what this macro expands to to the user. It will be a bunch of #if defines. Te library would choose a safe default, like checking the condition and calling `std::abort()` on failure, but the user could change it by defining some macros, like BOOST_FIXED_STRING_THROW_EXCEPTION_ON_PRECONDITION_VIOLATION. Thus, if the user doesn't have any better idea, she just defines the macro, and fixed_string will be throwing exceptions on precondition violations. Everything is "safe", at least under some definition of safety. If on my platform I cannot use exceptions, I can configure the precondition macro to call std::abort(). The important point is that I control the behavior only of the preconditions and only in this library. It does not affect other libraries. It also does not affect functions that are always guaranteed to throw, such as fixed_string::at(). Now, if I am doing a test build (or if I can afford doing this in production), I can compile my program with UB-sanitizer, where the compiler treats most of the language level UB as a checkable condition, checks it, and upon a failed check logs a message and aborts. Because compilers are now required to check most of the UB when constexpr functions when evaluated at compile-time, offers the same checks for runtime evaluation comes practically for free; and compilers like GCC and clang already provide them. At least one of them also inserts a runtime check and abort on every occurrence of `__builtin_unreachable()`. So if I define macro BOOST_FIXED_STRING_PRECONDITION to do `if(!condition) __builtin_unreachable()` I will get a uniform error message in my test build if the precondition is ever violated. Additionally, if I have a static analyzer like Coverity (and I think Microsoft also offers one), I can make the macro expand to an annotation that the static analyzer understands and looks for, and then paths that lead to precondition violation can be exposed statically without even running the program. This would not be possible if the library only guaranteed to throw exceptions. You can put any instrumentation code you want under this macro. But it only works because if the library doesn't guarantee any particular behavior on precondition violation: it only says "I leave myself the liberty to do what I choose." And of course, people in applications where the trade-off between security and and performance is different (like video games, in internal parts remote from doing networking) may choose -- after extensive testing has been done -- to expand the macro to `__builtin_assume() `to check if this gives them additional performance gain. But this is never done by default, has to be explicitly opted in, and is really a side bonus to the main feature which is about safety and controlling how bugs are detected. Regards, &rzej;
``` void fixed_string<N>::resize(size_type s) { BOOST_FIXED_STRING_PRECONDITION(s <= this->capacity()); // then do the job } ```
+1 on that. I'm always advocating for safe-by-default and found it a huge mistake to make operator[] the unchecked one instead of at() So using BOOST_FIXED_STRING_PRECONDITION which throws by default is the right choice IMO.
czw., 5 gru 2019 o 09:15 Alexander Grund via Boost
``` void fixed_string<N>::resize(size_type s) { BOOST_FIXED_STRING_PRECONDITION(s <= this->capacity()); // then do the job } ```
+1 on that. I'm always advocating for safe-by-default and found it a huge mistake to make operator[] the unchecked one instead of at()
So using BOOST_FIXED_STRING_PRECONDITION which throws by default is the right choice IMO.
However, in order for this to be implemented in `fixed_string`, the library authors would have to make (and document) a design decision that contract for using this library is that users write the code so that resizing never exceeds capacity, and if such condition is nonetheless ever detected it is treated as programmer bug. But my impression is that the library has taken a different route: it is absolutely fine to resize over capacity, and in this case the program will simply jump to a different place in the execution flow (throwing an exception does this), and user deliberately triggers this event because she wants to get to this specific place in program execution. But the library never documented which design was chosen, so I do not even know if it makes sense to suggest BOOST_FIXED_STRING_PRECONDITION(). Regards, &rzej;
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 2019-12-05 11:15, Alexander Grund via Boost wrote:
``` void fixed_string<N>::resize(size_type s) { BOOST_FIXED_STRING_PRECONDITION(s <= this->capacity()); // then do the job } ```
+1 on that. I'm always advocating for safe-by-default and found it a huge mistake to make operator[] the unchecked one instead of at()
So using BOOST_FIXED_STRING_PRECONDITION which throws by default is the right choice IMO.
I'm strongly opposed. Make it a BOOST_ASSERT if you like but no checks in release mode, please. What's the point of this check when your index is guaranteed to not exceed size()-1? In my whole programming practice, not once did I need at(). Not only because I didn't need the check at this point, but also because even if I did need a check at some point before operator[] call, I also was not satisfied with the exception at() would throw.
czw., 5 gru 2019 o 09:59 Andrey Semashev via Boost
On 2019-12-05 11:15, Alexander Grund via Boost wrote:
``` void fixed_string<N>::resize(size_type s) { BOOST_FIXED_STRING_PRECONDITION(s <= this->capacity()); // then do the job } ```
+1 on that. I'm always advocating for safe-by-default and found it a huge mistake to make operator[] the unchecked one instead of at()
So using BOOST_FIXED_STRING_PRECONDITION which throws by default is the right choice IMO.
I'm strongly opposed. Make it a BOOST_ASSERT if you like but no checks in release mode, please.
What's the point of this check when your index is guaranteed to not exceed size()-1?
In my whole programming practice, not once did I need at(). Not only because I didn't need the check at this point, but also because even if I did need a check at some point before operator[] call, I also was not satisfied with the exception at() would throw.
Are you opposing against the idea of user-controlled BOOST_FIXED_STRING_PRECONDITION() in general, or to throwing by default of to performing runtime-checks in release builds regardless of what action is taken later? BOOST_ASSERT() does perform checks in release builds unless you go and define NDEBUG, which does not correspond 1-to-1 to release builds. Regards, &rzej;
I'm strongly opposed. Make it a BOOST_ASSERT if you like but no checks in release mode, please. That is exactly the point: You defined BOOST_FIXED_STRING_PRECONDITION to BOOST_ASSERT and your use-case is covered. What's the point of this check when your index is guaranteed to not exceed size()-1?
In my whole programming practice, not once did I need at(). Not only because I didn't need the check at this point, but also because even if I did need a check at some point before operator[] call, I also was not satisfied with the exception at() would throw. Don't want to start that argument again. It's always the same. Reality is: Especially beginners fail to check the size and even advanced programmers accidently use sizes/index out of bounds leading to many of the existing CVEs due to buffer overrun. Again: Idea is: Safe by default. Throwing an exception satisfies this. std::abort would too but cannot be recovered from. If you don't want the default, you can change it. BOOST_ASSERT() does perform checks in release builds unless you go and define NDEBUG, which does not correspond 1-to-1 to release builds.
Build systems (e.g. CMake) usually defined NDEBUG for release builds. So one cannot rely on BOOST_ASSERT to be there for release builds.
On 2019-12-05 12:15, Alexander Grund via Boost wrote:
I'm strongly opposed. Make it a BOOST_ASSERT if you like but no checks in release mode, please. That is exactly the point: You defined BOOST_FIXED_STRING_PRECONDITION to BOOST_ASSERT and your use-case is covered.
My point is that I shouldn't have to define anything. Unchecked operator[] in release mode is expected by default.
On 2019-12-05 12:05, Andrzej Krzemienski wrote:
czw., 5 gru 2019 o 09:59 Andrey Semashev via Boost
mailto:boost@lists.boost.org> napisał(a): On 2019-12-05 11:15, Alexander Grund via Boost wrote: > >> ``` >> void fixed_string<N>::resize(size_type s) >> { >> BOOST_FIXED_STRING_PRECONDITION(s <= this->capacity()); >> // then do the job >> } >> ``` > > +1 on that. I'm always advocating for safe-by-default and found it a > huge mistake to make operator[] the unchecked one instead of at() > > So using BOOST_FIXED_STRING_PRECONDITION which throws by default is the > right choice IMO.
I'm strongly opposed. Make it a BOOST_ASSERT if you like but no checks in release mode, please.
What's the point of this check when your index is guaranteed to not exceed size()-1?
In my whole programming practice, not once did I need at(). Not only because I didn't need the check at this point, but also because even if I did need a check at some point before operator[] call, I also was not satisfied with the exception at() would throw.
Are you opposing against the idea of user-controlled BOOST_FIXED_STRING_PRECONDITION() in general, or to throwing by default of to performing runtime-checks in release builds regardless of what action is taken later?
I'm strongly opposed to runtime checks regardless of the error outcome in release mode (ok, when NDEBUG is defined) because most of the time you already know the index is within bounds. When you don't know, you most likely should have check that earlier (e.g. before the parsing loop or before your higher level operation starts). I'm opposed to throwing an exception in case of failure because the exception and its error description is likely meaningless to high level code and nearly useless when debugging. The higher level code may not want an exception to begin with, in which case it would perform the check itself, and the check in operator[] becomes useless. I'm mildly opposed to using custom macros (other than NDEBUG) to modify behavior at compile time, because it opens up the possibility of configuration inconsistencies and ODR issues. I'm fine with it as long as by default (when nothing custom is defined) the behavior is equivalent to BOOST_ASSERT.
BOOST_ASSERT() does perform checks in release builds unless you go and define NDEBUG, which does not correspond 1-to-1 to release builds.
NDEBUG gets defined by default in release mode by many IDEs and build systems, it is the standard macro to disable asserts, including the standard assert(). In practice, NDEBUG and release mode nearly always go hand in hand and NDEBUG is perceived as an indication of a release build.
I'm strongly opposed to runtime checks regardless of the error outcome in release mode (ok, when NDEBUG is defined) because most of the time you already know the index is within bounds. When you don't know, you most likely should have check that earlier (e.g. before the parsing loop or before your higher level operation starts).
I'm opposed to throwing an exception in case of failure because the exception and its error description is likely meaningless to high level code and nearly useless when debugging. The higher level code may not want an exception to begin with, in which case it would perform the check itself, and the check in operator[] becomes useless. No, the check gets optimized away if you do everything right. If you don't (do it right and have no check/exception/abort) you'll get UB, buffer overflows and security risks. Especially for a stack entity this will pretty much always be a SERIOUS security issue. So to protect users one would need at least std::abort, but using an exception allows to recover from it. I'm mildly opposed to using custom macros (other than NDEBUG) to modify behavior at compile time, because it opens up the possibility of configuration inconsistencies and ODR issues. I'm fine with it as long as by default (when nothing custom is defined) the behavior is equivalent to BOOST_ASSERT. ODR is a real issue, true that because you may end up using a compiled
That was always my counter argument: If you can prove it to a reviewer looking at your code that the index is in-bounds, then the compiler can do the same and remove the check and exception. I never bought "most of the time you already know the index is within bounds" because it is equivalent to "most of the time your code is correct, let's not use checks/unit tests/hardening/..." If what you say is true, then buffer overflows would be non-existant. Are they? library having it defined differently.
On 2019-12-05 12:45, Alexander Grund via Boost wrote:
I'm strongly opposed to runtime checks regardless of the error outcome in release mode (ok, when NDEBUG is defined) because most of the time you already know the index is within bounds. When you don't know, you most likely should have check that earlier (e.g. before the parsing loop or before your higher level operation starts).
That was always my counter argument: If you can prove it to a reviewer looking at your code that the index is in-bounds, then the compiler can do the same and remove the check and exception.
No, you're giving too much credit to compilers. Compilers cannot analyze at the same level as humans do. For example, if the valid index is stored in a data member in one method of a class and used in another, and the first method is required to be called first (which might also be enforced by a runtime check via another member variable), the compiler is not able to know that the index is always valid. At least, I haven't seen such cleverness.
I never bought "most of the time you already know the index is within bounds" because it is equivalent to "most of the time your code is correct, let's not use checks/unit tests/hardening/..." If what you say is true, then buffer overflows would be non-existant. Are they?
Mistakes happen, I'm not arguing with that. But defensive programming is not the solution because it doesn't save you from these mistakes. Code that passes invalid index to at() is just as incorrect as the one that does so with operator[].
I'm opposed to throwing an exception in case of failure because the exception and its error description is likely meaningless to high level code and nearly useless when debugging. The higher level code may not want an exception to begin with, in which case it would perform the check itself, and the check in operator[] becomes useless.
No, the check gets optimized away if you do everything right.
As I said above, you can't count on that.
If you don't (do it right and have no check/exception/abort) you'll get UB, buffer overflows and security risks. Especially for a stack entity this will pretty much always be a SERIOUS security issue. So to protect users one would need at least std::abort, but using an exception allows to recover from it.
You can't recover from an unexpected exception. And it is unexpected by account that we're talking about a human mistake. The correct way of tackling mistakes is human reviewing, testing and debugging. For the debugging part, a crash with a saved backtrace is more useful than an exception with a meaningless error message like "fixed_string::at", as many standard libraries like to do.
No, you're giving too much credit to compilers. Compilers cannot analyze at the same level as humans do. For example, if the valid index is stored in a data member in one method of a class and used in another, and the first method is required to be called first (which might also be enforced by a runtime check via another member variable), the compiler is not able to know that the index is always valid. At least, I haven't seen such cleverness.
I never bought "most of the time you already know the index is within bounds" because it is equivalent to "most of the time your code is correct, let's not use checks/unit tests/hardening/..." If what you say is true, then buffer overflows would be non-existant. Are they?
Mistakes happen, I'm not arguing with that. But defensive programming is not the solution because it doesn't save you from these mistakes. Code that passes invalid index to at() is just as incorrect as the one that does so with operator[]. No. Defensive programming is a safety net. If you always use at() and only in specific scenarios [] then at least your application will crash with a possibly good stacktrace or you can catch the exception in
That's why there is usually a checked and an unchecked method. Your
usual accesses though are: `for(auto c: string) ...`, for(int i=0;
i No, the check gets optimized away if you do everything right. As I said above, you can't count on that.
So what? Marginally lower performance if even measurable but safe from
buffer overflows.
You can't recover from an unexpected exception. And it is unexpected
by account that we're talking about a human mistake.
You can: For e.g. a connection/parser/... you terminate the connection
and log the issue. At the least you can catch it in main or some
exception hook to log a backtrace and exit gracefully.
The correct way of tackling mistakes is human reviewing, testing and
debugging. For the debugging part, a crash with a saved backtrace is
more useful than an exception with a meaningless error message like
"fixed_string::at", as many standard libraries like to do. You don't get a crash, at least not with a "meaningful backtrace" if you
don't use exceptions. You get UB, a security issue and possibly a crash
in some other part later instead of where the error was detected. And
why would "human reviewing, testing and debugging" be more reliable than
"human codewriting"? It is still a human and humans make mistakes.
Better use a safety net.
czw., 5 gru 2019 o 12:23 Alexander Grund via Boost
No, you're giving too much credit to compilers. Compilers cannot analyze at the same level as humans do. For example, if the valid index is stored in a data member in one method of a class and used in another, and the first method is required to be called first (which might also be enforced by a runtime check via another member variable), the compiler is not able to know that the index is always valid. At least, I haven't seen such cleverness.
That's why there is usually a checked and an unchecked method. Your usual accesses though are: `for(auto c: string) ...`, for(int i=0; i
The scenario you describe is hard to check for a human to: Are you sure this invariant always holds if it is enforced in some other method which need to have been called before? For all iterations of code evolving? If so then better use the string.unsafe_get_i_have_checked_the_index(i) method (made-up). This is a clear signal to any reviewer/reader that the index was indeed checked and the unsafe call is deliberately used.
I never bought "most of the time you already know the index is within bounds" because it is equivalent to "most of the time your code is correct, let's not use checks/unit tests/hardening/..." If what you say is true, then buffer overflows would be non-existant. Are they?
Mistakes happen, I'm not arguing with that. But defensive programming is not the solution because it doesn't save you from these mistakes. Code that passes invalid index to at() is just as incorrect as the one that does so with operator[]. No. Defensive programming is a safety net. If you always use at() and only in specific scenarios [] then at least your application will crash with a possibly good stacktrace or you can catch the exception in main(), prepare a meaningful error, log it, send it to your server and play jingle bells.
No, the check gets optimized away if you do everything right.
As I said above, you can't count on that. So what? Marginally lower performance if even measurable but safe from buffer overflows. You can't recover from an unexpected exception. And it is unexpected by account that we're talking about a human mistake. You can: For e.g. a connection/parser/... you terminate the connection and log the issue. At the least you can catch it in main or some exception hook to log a backtrace and exit gracefully. The correct way of tackling mistakes is human reviewing, testing and debugging. For the debugging part, a crash with a saved backtrace is more useful than an exception with a meaningless error message like "fixed_string::at", as many standard libraries like to do.
You don't get a crash, at least not with a "meaningful backtrace" if you don't use exceptions. You get UB, a security issue and possibly a crash in some other part later instead of where the error was detected. And why would "human reviewing, testing and debugging" be more reliable than "human codewriting"? It is still a human and humans make mistakes. Better use a safety net.
I think that this discussion diverges from the point of this thread. It was not about operator[] (for which it is clear that passing a too big index is a bug) but about resizing over fixed_string's capacity, in which case the library doesn't make it clear if it is even a bug, or something that the programmer may want to do. For instance, in std::sting it is not a bug to resize over the capacity, and fixed_string is supposed to be a drop-in replacement. I would like to make this question sorted out first; and only then can we decide what to put inside the function. Regards, &rzej;
On 2019-12-05 15:26, Andrzej Krzemienski via Boost wrote:
I think that this discussion diverges from the point of this thread. It was not about operator[] (for which it is clear that passing a too big index is a bug) but about resizing over fixed_string's capacity, in which case the library doesn't make it clear if it is even a bug, or something that the programmer may want to do.
For instance, in std::sting it is not a bug to resize over the capacity, and fixed_string is supposed to be a drop-in replacement.
I would like to make this question sorted out first; and only then can we decide what to put inside the function.
It is not so much about exceeding the capacity() but about exceeding max_size(). In fixed_string, capacity() happens to be equal to max_size() upon construction. Exceeding max_size() in std::string and containers is diagnosed with an exception, and IMO fixed_string should follow.
Folks, we are way overthinking the behavior of resize() over capacity(). It throws for a very simple reason. The most obvious and common optimization for highly concurrent network programs is simply to avoid the global allocator. And the obvious and common mitigation for resource exhaustion attacks is to limit the amount of data that an I/O handler is allowed to process. Using a fixed string as a temporary buffer for an algorithm satisfies both of these goals, and it does so without the caller having to write a bunch of code to check sizes in advance. Thanks
czw., 5 gru 2019 o 14:39 Vinnie Falco via Boost
Folks, we are way overthinking the behavior of resize() over capacity(). It throws for a very simple reason. The most obvious and common optimization for highly concurrent network programs is simply to avoid the global allocator. And the obvious and common mitigation for resource exhaustion attacks is to limit the amount of data that an I/O handler is allowed to process.
Using a fixed string as a temporary buffer for an algorithm satisfies both of these goals, and it does so without the caller having to write a bunch of code to check sizes in advance.
Thanks you. This is the kind of information that I was after. I would like to know more. Given that fixed_string does not have operator+, as well as the description above, I am inclined to think that it is not treated as a string, but as a buffer of characters. Next question is, why is a vector with fixed capacity not enough for this purpose? Do you ever have a need to call `.find_first_not_of()` in your use cases? Regards, &rzej;
On Thu, Dec 5, 2019 at 6:07 AM Andrzej Krzemienski
I would like to know more. Given that fixed_string does not have operator+, as well as the description above, I am inclined to think that it is not treated as a string, but as a buffer of characters. Next question is, why is a vector with fixed capacity not enough for this purpose? Do you ever have a need to call `.find_first_not_of()` in your use cases?
When dealing primarily with text based protocols: HTTP, URL, JSON, etc... you usually want to perform string operations. I agree that "drop-in replacement for std::string" is not a precise description for the purpose of the library, but it is partially correct. If you have code that currently uses std::string to perform calculations, and you want to impose a limit on the amount of data it can process while simultaneously avoiding memory allocations, a fixed_string is going to be easier to integrate than vector<char>. Thanks
czw., 5 gru 2019 o 15:11 Vinnie Falco
I would like to know more. Given that fixed_string does not have operator+, as well as the description above, I am inclined to think that it is not
On Thu, Dec 5, 2019 at 6:07 AM Andrzej Krzemienski
wrote: treated as a string, but as a buffer of characters. Next question is, why is a vector with fixed capacity not enough for this purpose? Do you ever have a need to call `.find_first_not_of()` in your use cases?
When dealing primarily with text based protocols: HTTP, URL, JSON, etc... you usually want to perform string operations. I agree that "drop-in replacement for std::string" is not a precise description for the purpose of the library, but it is partially correct. If you have code that currently uses std::string to perform calculations, and you want to impose a limit on the amount of data it can process while simultaneously avoiding memory allocations, a fixed_string is going to be easier to integrate than vector<char>.
Maybe we need a concept that would help draw the line clearly? Regards, &rzej;
On 6/12/2019 03:11, Vinnie Falco wrote:
When dealing primarily with text based protocols: HTTP, URL, JSON, etc... you usually want to perform string operations. I agree that "drop-in replacement for std::string" is not a precise description for the purpose of the library, but it is partially correct. If you have code that currently uses std::string to perform calculations, and you want to impose a limit on the amount of data it can process while simultaneously avoiding memory allocations, a fixed_string is going to be easier to integrate than vector<char>.
vector and/or static_vector combined with Boost.StringAlgorithm? Maybe we should just be making a more modern static_vector rather than a fixed_string.
On 6/12/2019 11:16, Vinnie Falco wrote:
On Thu, Dec 5, 2019 at 1:53 PM Gavin Lambert wrote:
Maybe we should just be making a more modern static_vector rather than a fixed_string.
Add CharTraits to boost::static_vector? Sure! I'll let you write that up and submit it as a pull request...
It doesn't need them. Boost.StringAlgorithm is the part that deals with char traits. And it already supports duck-typed containers. The only thing possibly missing is constexpr support.
On 2019-12-05 14:22, Alexander Grund via Boost wrote:
No, you're giving too much credit to compilers. Compilers cannot analyze at the same level as humans do. For example, if the valid index is stored in a data member in one method of a class and used in another, and the first method is required to be called first (which might also be enforced by a runtime check via another member variable), the compiler is not able to know that the index is always valid. At least, I haven't seen such cleverness.
That's why there is usually a checked and an unchecked method. Your usual accesses though are: `for(auto c: string) ...`, for(int i=0; i
The scenario you describe is hard to check for a human to: Are you sure this invariant always holds if it is enforced in some other method which need to have been called before? For all iterations of code evolving?
Yes, and such use cases appear more often than you might think. Call that first method a constructor or init(), and it becomes obvious to the human reviewer that that method must be called before further using the object. You may argue what if that init() method is not called and the answer depends on how probable that is. Maybe that method is only called from a few places of your program, like factory methods, and you can easily ensure the precondition is not violated. If it is a real possibility that you really want to protect from, add a flag or use some other criteria for checking that it was called. More often, though, I would just make it a documented precondition, and allow UB if users don't follow it.
If so then better use the string.unsafe_get_i_have_checked_the_index(i) method (made-up). This is a clear signal to any reviewer/reader that the index was indeed checked and the unsafe call is deliberately used.
Such method is called operator[]. I understand you want that name for the checked version, but that is not the case for historical reasons (for which I'm glad). And again, where check is needed, I will have it written myself and the way I want it (which may not involve throwing an exception). Where I don't have it written, I don't want it, at all, and I'm responsible for the consequences. I would prefer if such use pattern didn't require me jumping through hoops.
Mistakes happen, I'm not arguing with that. But defensive programming is not the solution because it doesn't save you from these mistakes. Code that passes invalid index to at() is just as incorrect as the one that does so with operator[]. No. Defensive programming is a safety net.
It is not. It doesn't make your code correct, it doesn't make it work, it doesn't help you fixing the issue. It only prevents your code from crashing, which is a bad thing (because a crash - with a backtrace - is better).
If you always use at() and only in specific scenarios [] then at least your application will crash with a possibly good stacktrace or you can catch the exception in main(), prepare a meaningful error, log it, send it to your server and play jingle bells.
Thing is, you can't prepare a meaningful error in main(). You don't have a backtrace at that point and you don't know which of the thousands of at() calls threw. Sure, your application won't crash via signal, but, as I said, that is a bad thing.
No, the check gets optimized away if you do everything right.
As I said above, you can't count on that. So what? Marginally lower performance if even measurable but safe from buffer overflows.
I don't want that marginal performance drop, especially given that I get nothing in return.
You can't recover from an unexpected exception. And it is unexpected by account that we're talking about a human mistake. You can: For e.g. a connection/parser/... you terminate the connection and log the issue. At the least you can catch it in main or some exception hook to log a backtrace and exit gracefully.
You could terminate the connection and things like that, but ultimately that exception doesn't help you fixing the problem. When connections are dropping like flies on your production server and you have no idea where to fix the problem, I'd prefer to get just one backtrace of a crash instead and then fix the bug in a matter of minutes.
The correct way of tackling mistakes is human reviewing, testing and debugging. For the debugging part, a crash with a saved backtrace is more useful than an exception with a meaningless error message like "fixed_string::at", as many standard libraries like to do.
You don't get a crash, at least not with a "meaningful backtrace" if you don't use exceptions. You get UB, a security issue and possibly a crash in some other part later instead of where the error was detected. And why would "human reviewing, testing and debugging" be more reliable than "human codewriting"? It is still a human and humans make mistakes. Better use a safety net.
With asserts enabled you'll get a crash at the point of error. Hopefully, you'll discover the bug soon enough during testing. But even in release builds, when asserts are disabled, chances are high the crash will still point you to the problematic code. Of course, memory access bugs are nasty and painful to deal with, but I honestly didn't have to deal with one for quite a long time now, even though I've never used at(). You're more likely to mess up memory when working with raw pointers.
Such method is called operator[]. I understand you want that name for the checked version, but that is not the case for historical reasons (for which I'm glad).
And again, where check is needed, I will have it written myself and the way I want it (which may not involve throwing an exception). Where I don't have it written, I don't want it, at all, and I'm responsible for the consequences. I would prefer if such use pattern didn't require me jumping through hoops. Well in a better world you'd use .at(i) instead of [i] because YOU know what you are doing. Beginners don't. Advanced people neither, as can be seen by CVEs. It is not. It doesn't make your code correct, it doesn't make it work, it doesn't help you fixing the issue. It only prevents your code from crashing, which is a bad thing (because a crash - with a backtrace - is better). You keep assuming that an OOB access leads to a crash. This is NOT the case. An OOB access is UB and a security risk. Thing is, you can't prepare a meaningful error in main(). You don't have a backtrace at that point and you don't know which of the thousands of at() calls threw. Sure, your application won't crash via signal, but, as I said, that is a bad thing. I thought at least on Windows you got unhandled_exception_filter or so. Well then don't use an exception but a signal or std::abort. Or a stack-trace enhanced exception (e.g. https://sourceforge.net/p/stacktrace), still not perfect of course. Anyway at least debuggers can break on exceptions, so you got that. I don't want that marginal performance drop, especially given that I get nothing in return. Again: You get security. At least std::abort, but never continue! You could terminate the connection and things like that, but ultimately that exception doesn't help you fixing the problem. When connections are dropping like flies on your production server and you have no idea where to fix the problem, I'd prefer to get just one backtrace of a crash instead and then fix the bug in a matter of minutes. Attach a debugger, break on throw. Alternative: Have users craft packets to force a buffer-overflow and read your secrets or take over your machine. Great! With asserts enabled you'll get a crash at the point of error. Hopefully, you'll discover the bug soon enough during testing. But even in release builds, when asserts are disabled, chances are high the crash will still point you to the problematic code. Of course, memory access bugs are nasty and painful to deal with, but I honestly didn't have to deal with one for quite a long time now, even though I've never used at(). You're more likely to mess up memory when working with raw pointers. Sure, then have asserts in release mode. Optimized away in 90% of the cases but rest assured that no-one takes over your machine. I don't care if exceptions are used, anything preventing execution of the next instruction is fine.
participants (7)
-
Alexander Grund
-
Andrey Semashev
-
Andrzej Krzemienski
-
Emil Dotchevski
-
Gavin Lambert
-
Peter Dimov
-
Vinnie Falco