[review][fixed_string] FixedString review results
These are the results of the candidate Boost.FixedString review. Thanks to all the people (and there's been many) who contributed. To clear up the tension, let me begin by announcing that: Candidate Boost.FixedString is ACCEPTED into Boost. Acceptance is subject to the authors' addressing of some issues raised during the review that I label as "priority 1" below. *Summary of reviews* We've had 10 full reviews: * Accept: Deniz Bahadir, Emil Dotchevski, Gavin Lambert, JeanHeyd Meneide, Paul A. Bristow, Peter Dimov, Peter Koch Larsen. * Conditionally accept: Zach Laine. * Reject: Andrzej Krzemienski, Phil Endecott. Zach's conditions to acceptance seem minor and easily addressable to me. Andrzej based his rejection on a perceived lack of clarity as to what the actual pitch of the library is, which ultimately boils down to the issue of throwing/asserting on capacity overflow: my impression is that a strong pitch, if not a universally acclaimed one, emerged during discussions. Finally, Phil voted to reject but then went on to state that he would accept based on a number of "minimally sufficient" conditions that are trivial to meet (or even have been already implemented by Krystian). All in all, I think it is a reasonable compromise to grant acceptance based on the resolution of a number of important issues, listed below. *Issues* I've classified issues in three different priorities: * Priority 1: to be solved *before* inclusion into Boost. * Priority 2: to be addressed somewhere in the future. * Priority 3: anecdotal or without clear support from the community, to be addressed at the authors's discretion. * Priority 1 issues* 1. CMakeLists.txt should either be made generic or removed from repo (it is currently a specific thing for Visual Studio). 2. Choose a name for the component. Some consensus arose that naming should follow std::string's and be of the form basic_foo<...> with aliases foo, wfoo, etc. As for "foo", the following candidates have been proposed: * static_string * fixed_capacity_string * array_string * static_capacity_string * inplace_string * capped_string * bounded_capacity_string * bounded_string To be clear, I don't think any of these is clearly more popular than the rest, so it is up to the authors to make up their mind. 3. constexpr as much as possible, taking into account that, pre C++20, this requires zero initialization of the data array. 4. noexcept as much as possible and, in principle, at least as much as std::string. 5. Provide hash overloads. Up to the authors if they want to cover Boost.Hash as well. 6. Determine the exact behavior on capacity overflow. I think *some* partial consensus arose around throwing via boost::throw_exception so as to provide an opt-out mechanism for exception-handicapped scenarios. This has been the most contentious issue during the review but I think the authors are in their right to decide on throwing much as anyone can decide otherwise and propose what would be basically a different library. 7. Make substr return a fixed_string rather than a view, and additionally provide a view-returning subview member function. 8. State clearly on the docs what C++ version is required. 9 There are wrong "size() + m > max_size()" checks (they could overflow) in the code. 10. to_fixed_string should be a set of overloads mathing those of std::string. 11. There's an issue around some detail::is_input_iterator alias that Zach asked be clarified. 12. Compile-fail tests should be added to verify that those functions with constraints are correctly constrained. *Priority 2* 13. Determine if fixed_string<0> should be specialized or not to make it more memory efficient. This raises potential issues in the uniqueness of the pointer returned by data(). 14. Determine if operator+(fixed_string,fixed_string) should be provided. My impression is that people generally wanted this, and I've been tempted to raise this to priority 1. 15. Optimize the type of the internal size data member when size_t is larger than necessary. 16. BOOST_FIXED_STRING_USE_BOOST is unconditionally defined within the code. The authors should clearly state what this is for and let the user configure the library to use with/without Boost or, else, remove the thing entirely. 17. Source code organization (three equally named files, detal and impl folders), raised some eyebrows. Some (including Vinnie) favor a single-file implementation. 18. Docs should be more C++ standard-like and provide a synopsis. Vinnie has indicated they may abandon Doxygen in favor of asciidoc to produce better-quality docs. In general, people were not enthusiastic about the documentation. 19. The current monolithic test should be split up in lighter, more manegeable test files. Peter Dimov suggested some additional improvements on the testing part. * Priority 3* 20. Provide a literal operator"" 21. Should traits parameter be removed? 22. Store size (as capacity-size) in the last char of the data array, so as to save one memory byte. This might affect performance, though. 23. Add functions for checking remaining capacity. Thanks to Krystian and Vinnie for their work! And to everybody who has contributed to this really interesting review. Best regards, Joaquín M López Muñoz Review manager for candidate Boost.FixedString
On Sun, Dec 15, 2019 at 12:53 PM Joaquin M López Muñoz via Boost
Candidate Boost.FixedString is ACCEPTED into Boost
Thank you for your generous investment of time in managing this review. Although my name is associated with it, I feel that Krystian should be the principal investigator and therefore, take my comments below as suggestions not requirements.
1. CMakeLists.txt should either be made generic or removed from repo (it is currently a specific thing for Visual Studio).
Here I must object, I need a CML that produces a proper Visual Studio project file because that is my primary workflow. This is going to be a recurring problem with all of my repositories (Beast, JSON come to mind and I have several more in the pipeline). I don't have the knowledge to solve it, but if someone was to submit a patch that makes the CML work correctly for everyone and continues to satisfy the needs of my workflow while still being maintainable by me, I would be more than happy to run with that and port it to my other projects.
* static_string
I regret changing the name hastily in the first place, I should have left it as static_string because this is consistent with boost::static_vector.
3. constexpr as much as possible, taking into account that, pre C++20 this requires zero initialization of the data array. 4. noexcept as much as possible and, in principle, at least as much as std::string. 5. Provide hash overloads. Up to the authors if they want to cover Boost.Hash as well.
I agree with everything here.
6. Determine the exact behavior on capacity overflow.
I only just a few weeks ago saw Boost.Container's solution, committed recently, which is to make the behavior of overflow a policy. Although my personal view is that this is an overly complex solution, enough people want the other behavior that it makes sense to just copy what Ion did and do the same thing in static_string. This will make everyone happy and it will be consistent with Boost.Container, thus a fairly safe and uncontroversial move.
7. Make substr return a fixed_string rather than a view, and additionally provide a view-returning subview member function. 8. State clearly on the docs what C++ version is required. 9 There are wrong "size() + m > max_size()" checks (they could overflow) in the code. 10. to_fixed_string should be a set of overloads mathing those of std::string. 11. There's an issue around some detail::is_input_iterator alias that Zach asked be clarified.
Yes to all.
12. Compile-fail tests should be added to verify that those functions with constraints are correctly constrained.
Visual Studio doesn't easily support compile-fail tests, so I would prefer to see these implemented as SFINAE-based static_asserts instead, so that each compile-fail test doesn't need to go into a separate build target.
*Priority 2*
These are all good areas of research.
16. BOOST_FIXED_STRING_USE_BOOST is unconditionally defined within the code. The authors should clearly state what this is for and let the user configure the library to use with/without Boost or, else, remove the thing entirely.
The idiom I am using in all of my new libraries, which I think should be applied identically to static_string is thus: 1. The default configuration is to build a linkable library (i.e. not header only) and require Boost. For static_string, the linkable part is not really applicable since the library consists only of templates. 2. Optionally, the user can define BOOST_STATIC_STRING_STANDALONE, and the files will compile without Boost. They will not include any Boost headers, but all declarations will still be in the boost namespace. Users who define BOOST_STATIC_STRING_STANDALONE must still qualify library identifiers with boost::static_string. The standalone version of the library may have higher requirements. In this case it would require C++17 (for string_view). 3. Optionally, the user can define BOOST_STATIC_STRING_HEADER_ONLY, and the files will compile in header-only mode, where no separate linker input is needed. This is not applicable to static_string but it is how I am structuring my other libraries (except for Beast, which I am leaving alone for now to not break users).
17. Source code organization (three equally named files, detal and impl folders), raised some eyebrows.
Generally speaking this is the source code organization I use, which closely resembles how Boost.Asio and standalone Asio are structured, and this is done so for 1. minimizing the amount of header material exposed to the documentation toolchain, 2. providing the separation needed to support the header-only configuration, and 3. Separating public from private implementation. There's nothing generally wrong with the equally named files, in larger projects it would be highly annoying to have to give them all different names, it is much easier to find the matching files when the names are the same. For static_string though, which is template-only and rather limited in how large it can get, putting it in one header file probably outweighs the benefits of separation. And in fact, advertising "just copy one file into your project to use this!" is a pretty good selling point (especially with BOOST_STATIC_STRING_STANDALONE defined).
Vinnie has indicated they may abandon Doxygen in favor of asciidoc to produce better-quality docs.
This is up to Krystian, and I support such a change. I have no opinions on the rest. Thanks!!
niedz., 15 gru 2019 o 22:33 Vinnie Falco via Boost
On Sun, Dec 15, 2019 at 12:53 PM Joaquin M López Muñoz via Boost
wrote: Candidate Boost.FixedString is ACCEPTED into Boost
6. Determine the exact behavior on capacity overflow.
I only just a few weeks ago saw Boost.Container's solution, committed recently, which is to make the behavior of overflow a policy. Although my personal view is that this is an overly complex solution, enough people want the other behavior that it makes sense to just copy what Ion did and do the same thing in static_string. This will make everyone happy and it will be consistent with Boost.Container, thus a fairly safe and uncontroversial move.
I wanted to indicate that this would in fact still be a controversial move.
I need a library to tell me: is resizing over max_size() a bug or a correct
usage? And the answer you will give me is "it is neither or both: it
depends on what policy is selected". And this makes the library's contract
more vague. If I am in a template:
```
template
Hiding the decision under a macro doesn't seem an uncontroversial solution either. If I configure the macro to mean "over-resizing is a bug", and I am using a third party library that internally uses `static_string`, and I may not even know about it, and it defines the same macro as "over-resize is fine", I will get a ODR violation and the likely outcome will be that either I or the third party library will get a different behavior than requested.
IMO the ODR violation is critical. I don't think (anymore) behavior should be changed by (such kind of) definitions
On Mon, Dec 16, 2019 at 12:44 AM Andrzej Krzemienski via Boost
I wanted to indicate that this would in fact still be a controversial move.
You bring up very good points, and I didn't really think much about the Policy approach, because Boost.Container uses it and I simply assume that most Boost libraries are correct as written. Every problem that you pointed out is also surely a problem with Boost.Container then, does static_vector do the wrong thing here? https://github.com/boostorg/container/blob/dd158987e6bfbb83f82f35d6e34c93577... Thanks
On Mon, Dec 16, 2019 at 8:50 AM Vinnie Falco via Boost < boost@lists.boost.org> wrote:
On Mon, Dec 16, 2019 at 12:44 AM Andrzej Krzemienski via Boost
wrote: I wanted to indicate that this would in fact still be a controversial move.
You bring up very good points, and I didn't really think much about the Policy approach, because Boost.Container uses it and I simply assume that most Boost libraries are correct as written. Every problem that you pointed out is also surely a problem with Boost.Container then, does static_vector do the wrong thing here?
https://github.com/boostorg/container/blob/dd158987e6bfbb83f82f35d6e34c93577...
I think so, yes. This is a case of not being generic, but instead being vague. As Andrej points out, in many contexts I may not even know what the contract is. Zach
On 16/12/2019 21:43, Andrzej Krzemienski wrote:
``` template
void do_something(static_string &str) { // ??? } ``` I do not know what the contract is.
Hiding the decision under a macro doesn't seem an uncontroversial solution either. If I configure the macro to mean "over-resizing is a bug", and I am using a third party library that internally uses `static_string`, and I may not even know about it, and it defines the same macro as "over-resize is fine", I will get a ODR violation and the likely outcome will be that either I or the third party library will get a different behavior than requested.
This is why, as I mentioned elsewhere in a related context, macros should only ever select between different explicitly-named-differently implementations (such as setting a default for that template policy parameter, but not replacing the template policy parameter). Otherwise you get ODR violations and inconsistent ABI. At least when explicitly part of the signature, both implementations can co-exist in the same binary with minimal issues as long as they're kept strictly separate -- and if someone tries to "cross the streams" then they'll get linker errors. Or, as I also said elsewhere, just pick one and stick to your guns. Personally, I prefer throwing exceptions for that case; it's much safer than allowing a buffer overflow.
My recommendation would be to just make a call that over-resizing the string is fine and calls `boost::throw_exception()` and not allow the users to customize it beyond what `boost::throw_exception()` already offers. If I need a library that needs the over-resizing to be a diagnosable bug, i will use a different one (which may be a thin wrapper over `static_string`.
+1. Although throw_exception itself is not entirely immune to the problems above, it is at least a customisation point that only the application author is allowed to touch, which is better than a macro. [Although things get a bit dicier with shared objects and private visibility.]
This is why, as I mentioned elsewhere in a related context, macros should only ever select between different explicitly-named-differently implementations (such as setting a default for that template policy parameter, but not replacing the template policy parameter).
Otherwise you get ODR violations and inconsistent ABI. At least when explicitly part of the signature, both implementations can co-exist in the same binary with minimal issues as long as they're kept strictly separate -- and if someone tries to "cross the streams" then they'll get linker errors.
Or, as I also said elsewhere, just pick one and stick to your guns. Personally, I prefer throwing exceptions for that case; it's much safer than allowing a buffer overflow. +1 to the whole. Having macros to select default template params avoids
On 16.12.19 23:43, Gavin Lambert via Boost wrote:
the ODR and mixing issues. HOWEVER: What about you set
BOOST_USE_THROW=1, then include libraryX then Boost and then use
`static_string
+1. Although throw_exception itself is not entirely immune to the problems above, it is at least a customisation point that only the application author is allowed to touch, which is better than a macro.
[Although things get a bit dicier with shared objects and private visibility.] Oh yes... Damn shared objects. And with boost::throw_exception opens the door for ODR violations again...
On Sun, Dec 15, 2019 at 3:53 PM Joaquin M López Muñoz via Boost < boost@lists.boost.org> wrote:
Candidate Boost.FixedString is ACCEPTED into Boost.
Thank you again for managing the review :) Here is what has been implemented thus far: 3. constexpr as much as possible, taking into account that, pre C++20,
this requires zero initialization of the data array.
Implemented, but tests need to be written
4. noexcept as much as possible and, in principle, at least as much as std::string.
Mostly implemented (some more complex conditional noexcepts need to be done)
5. Provide hash overloads. Up to the authors if they want to cover Boost.Hash as well.
Support for std::hash and Boost.Hash have been implemented
6. Determine the exact behavior on capacity overflow.
I think we will stick with exceptions, and add a macro that disables exceptions and instead makes them precondition violations
7. Make substr return a fixed_string rather than a view, and additionally provide a view-returning subview member function.
Implemented.
9 There are wrong "size() + m > max_size()" checks (they could overflow) in the code.
This has been addressed
13. Determine if fixed_string<0> should be specialized or not to make it more memory efficient.
This has been implemented, so feel free to play around with it
15. Optimize the type of the internal size data member when size_t is larger than necessary.
Implemented
16. BOOST_FIXED_STRING_USE_BOOST is unconditionally defined within the code.
This has been addressed. In addition to that, it has been renamed to BOOST_FIXED_STRING_STANDALONE
22. Store size (as capacity-size) in the last char of the data array, so as to save one memory byte.
This has been implemented, there will be a macro to enable/disable this.
On 16/12/2019 13:45, Krystian Stasiowski wrote:
22. Store size (as capacity-size) in the last char of the data array, so as to save one memory byte.
This has been implemented, there will be a macro to enable/disable this.
Using a macro to alter storage layout is a bad idea unless done very carefully (eg. the macro selects between implementations in different namespaces, or with different template parameters). Otherwise you risk subtle ABI breakage problems if it (accidentally or otherwise) ends up defined differently in different translation units that are later linked together. Anything that alters the ABI layout of the type needs to be reflected in the type's mangled signature in some way. Otherwise, just pick something and stick with it; don't make it configurable.
On 21/12/2019 12:04, Vinnie Falco wrote:
On Sun, Dec 15, 2019 at 12:53 PM Joaquin M López Muñoz wrote:
10. to_fixed_string should be a set of overloads mathing those of std::string.
What's wrong with one or two function templates constrained on say, integers and floating point numbers?
For one, a set of overloads should compile much faster than SFINAE templates. It's also less prone to forgetting that bool is an integer type (and in this context we probably don't want it to be).
participants (7)
-
Alexander Grund
-
Andrzej Krzemienski
-
Gavin Lambert
-
Joaquin M López Muñoz
-
Krystian Stasiowski
-
Vinnie Falco
-
Zach Laine