Dear All, Here is my review of the proposed Fixed String library. In summary I think this would be a useful thing to have in Boost, but there are a few issues with the code as it currently stands. This is based on studying the code and docs and the previous mailing list discussions in September; I haven't actually tried to use the library. Firstly regarding the name, I would much prefer static_string for consistency with boost::container::static_vector and with P0843. Note that static_vector was the preferred name in an LEWG poll, according to P0843. It's true that "static" is an ambiguous word that does not really convey what is different about this string, but then neither does "fixed" in my opinion; "fixed capacity" would, but just "fixed" sounds more like "immutable" to me. On the subject of P0843, it it worth reading as it contains some useful links and rationale: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0843r2.html Also a more recent r3 update and reference implementation: https://github.com/gnzlbg/static_vector Here are the features that I think are missing, in no particular order; apologies if some of this does exist but I've missed it: - Much more of it should be constexpr. See P0980R1. That mentions that some of char_traits needs to be constexpr as a prerequisite, but even without that things like size() and empty() can be constexpr. There is also a question of whether all of the elements must be initialised (to 0) by the ctor to be constexpr, which could be costly for a large but sparse container, though it does mean that subsequent push_back()s don't need to store terminators; I don't understand all the issues involved here. (I note a comment in the docs suggesting the lack of constexpr may be temporary.) - In the same vein, more of it should be noexcept - why aren't empty() and size() ? - You can add a specialisation for size 0 that turns it into an empty struct. - You can add a string literal operator"". - You can add std::hash overloads. There are various possibilities for what should happen if the capacity is exceeded, including throwing an exception, asserting, and allowing undefined behaviour. I'm not sure what I would really prefer though I think I'd be more likely to assert() than to throw. See P0843r2 again which discusses this. Regarding your deviations from std::string: operator+ has been omitted because you consider it's not possible to determine a reasonable return type; isn't it just size N+M ? Do you reject that just because it could end up rather large after a sequence of additions? I'd agree if it were N*M but not really for N+M. Have I missed something? You've chosen to return a string_view from substr(). I think that's a bit dangerous; code that does "auto a = s.substr(...)" could end up with a dangling view when s is a fixed string; this makes it more difficult than it should be to migrate from one string type to another, or to write generic code. Regarding the docs, it would be great if the reference could have some highlighting (e.g. bright red?) for the few places where it differs from std::string. My own use for this sort of string has often been to store very large numbers of small strings when total memory usage has been a concern. (A good example would be an textual ID code of some sort, like a postal code or an airport code etc.) For this case, the 8-byte overhead for the size_t length field is a concern; I'd like a static_string<6> to need only 8 bytes total. I suggested changing this to the minimum size needed to store values 0 to N in the previous email discussion, but that hasn't been implemented. My initial thought was that static_string should be implemented on top of boost::container::static_vector<char>. My rationale for that was that static_vector is well-established and doesn't need to be re-implemented, and that a string_facade that adapts a vector<char>-like interface into a string-like interface would be a useful component in its own right. (I've tried to write something like this myself, but it gets tedious very quickly writing out all of those special string methods.) But then I discovered that actually boost::container::static_vector is not implemented as a simple size,data[] pair; apparently it contains pointers to its own storage. So it isn't even trivially- copyable. Now that I know that, I wonder if what we really need is a low-storage-overhead static_vector "done properly" (like the P0843 reference implementation), with a string_facade to make static_string from static_vector<char>. On balance, my view is that this proposal is promising but not quite ready yet. I suggest that the authors should be asked to come back in a few months with a revised version. My preferred design would be an improved static_vector and a string_facade that converts it into a static_string, but if they don't want to do that then simply tidying up the constexpr, noexcept and hash issues would be minimally sufficient. Regards, Phil.
On Tue, Nov 26, 2019 at 8:31 AM Phil Endecott via Boost
I would much prefer static_string
Same here... I was foolish to rename it.
- Much more of it should be constexpr.
I agree that it could be constexpr (at least some of it) but this can be added as an improvement later. Taking the opposite position, why should it be constexpr? The purpose of this container is to serve as a performant substitute for std::string when the maximum size is known ahead of time or can be reasonably estimated. In which use-cases do you anticipate the need for a constexpr static_string? It would certainly waste space, unless the caller sizes it perfectly. But then what's the difference between that and a constexpr string literal or string view? In my opinion people are too quick to say "constexpr it" without evaluating the necessity. I'm not saying we shouldn't do it (but see below), however lets not pretend that there are legions of users eager for a constexpr fixed capacity mutable string.
There is also a question of whether all of the elements must be initialised (to 0) by the ctor to be constexpr, which could be costly for a large but sparse container, though
I think initializing everything to 0 is a non-starter for the non-constexpr case, due to the performance penalty. And initializing to 0 only for constexpr requires new language features I believe? std::is_constexpr_evaluated or some such.
it does mean that subsequent push_back()s don't need to store terminators;
It would still need to null-terminate, for the case where characters are removed from the string and subsequently reinserted.
- You can add a specialisation for size 0 that turns it into an empty struct.
- You can add a string literal operator"".
- You can add std::hash overloads.
I'm not seeing the value in these features. What's the use case? No one will be using static_string as the key for a container, since it wastes space. Why do we need a string literal operator, to produce a string constant that takes up more space than necessary? And why are we "optimizing" the case where size==0? Is this something that anyone will notice or care about (aside from WG21 of course, which often concerns itself with not-relevant things).
There are various possibilities for what should happen if the capacity is exceeded, including throwing an exception, asserting, and allowing undefined behaviour. I'm not sure what I would really prefer though I think I'd be more likely to assert() than to throw. See P0843r2 again which discusses this.
Quoting some nonsense from p0843r2: 1. Throw an exception. 2. Make this a precondition violation. ... It could throw either std::out_of_bounds or std::logic_error but in any case the interface does not end up being equal to that of std::vector. ... The alternative is to make not exceeding the capacity a precondition on the static_vector's methods. So basically Gonzalo (the paper's author) says that throwing an exception of the wrong type makes the interface different from std::vector, thus the solution is to make the interface even more different from vector by requiring callers to check sizes. Gonzalo lists existing practice of Boost.Container's static_vector, with throws on overflow, as existing practice. Only in the bizarro world of WG21 could someone write a paper which quotes existing practice, then proceed to deviate from existing practice for illogical reasons. We can return to sanity by recognizing what the purpose of static_string is for: "...a performant substitute for std::string when the maximum size is known ahead of time or can be reasonably estimated." Throwing an exception is the obvious rational choice. It lets static_string be mostly a drop-in replacement, without requiring call sites to add extensive code to check for preconditions. It guarantees that any successful modifications to the static_string will be exact (no truncation). And most importantly it matches existing practice (e.g. boost::container::static_vector and others like it).
Regarding your deviations from std::string: operator+ has been omitted because you consider it's not possible to determine a reasonable return type; isn't it just size N+M ? Do you reject that just because it could end up rather large after a sequence of additions? I'd agree if it were N*M but not really for N+M. Have I missed something?
Yes. It is too easy for someone to change existing code that does a+b+c+d where the operands are strings. Users could the get surprising behavior of tens or hundreds of kilobytes of stack consumption. There are other ways to implement operator+ but we prefer to the omit interfaces that can have surprising behavior, or that have non-obvious results (such as truncating).
You've chosen to return a string_view from substr(). I think that's a bit dangerous; code that does "auto a = s.substr(...)" could end up with a dangling view when s is a fixed string; this makes it more difficult than it should be to migrate from one string type to another, or to write generic code.
Again we have to refer to the purpose of the container. People are using this for performance, the very last thing they want from substr() is to receive a copy. Users can opt-in to making a copy if they want, by constructing a new static_string from the string view returned by substr. If we go with your suggestion, no one can opt out of copies.
My initial thought was that static_string should be implemented on top of boost::container::static_vector<char>. My rationale...
No normal user is going at static_string and think "I would use this if it required Boost.Container." Fewer dependencies is better. And, this library is designed to work without the rest of Boost (the "standalone" mode). I want people to be able to do "vcpkg install static_string" and then be ready to boogie. They will still have to type `boost::` to access it. Regards
On 27/11/2019 06:46, Peter Dimov wrote:
Vinnie Falco wrote:
In which use-cases do you anticipate the need for a constexpr static_string?
When manipulating strings at compile time? Such as s1 += s2?
Wouldn't that use case be better served with an immutable string and operator+? At compile time, you can maintain the invariant length() == capacity(), which this type cannot do (and is not intended for).
Gavin Lambert wrote:
On 27/11/2019 06:46, Peter Dimov wrote:
Vinnie Falco wrote:
In which use-cases do you anticipate the need for a constexpr static_string?
When manipulating strings at compile time? Such as s1 += s2?
Wouldn't that use case be better served with an immutable string and operator+?
Not in general. constexpr string_view f( int x ); constexpr auto g( int n ) { fixed_string<512> s; for( int i = 0; i < n; ++i ) { s += f( i ); } return s; } You can rewrite this specific example using recursion but in general, it'd be painful and unreadable.
In the same vein, more of it should be noexcept - why aren't empty() and size() ?
empty() and size() not being noexcept is more of an oversight on our part. As for why more of the member functions aren't noexcept, they have the same exception specifications as those of their counterparts in std::string. That being said, although they do call "dangerous" operations like Traits::length that are not noexcept, such functions don't throw (at least for std::char_traits), so its difficult to specify as noexcept since Traits could also be a user defined type.
You can add a specialisation for size 0 that turns it into an empty struct.
Certainly, it would be fairly trivial to do so.
I suggested changing this to the minimum size needed to store values 0 to N in the previous email discussion, but that hasn't been implemented.
Sure, that's extremely trivial to do, and seems like a great idea.
On Tue, Nov 26, 2019 at 7:25 PM Peter Dimov via Boost
Gavin Lambert wrote:
On 27/11/2019 06:46, Peter Dimov wrote:
Vinnie Falco wrote:
In which use-cases do you anticipate the need for a constexpr static_string?
When manipulating strings at compile time? Such as s1 += s2?
Wouldn't that use case be better served with an immutable string and operator+?
Not in general.
constexpr string_view f( int x );
constexpr auto g( int n ) { fixed_string<512> s;
for( int i = 0; i < n; ++i ) { s += f( i ); }
return s; }
You can rewrite this specific example using recursion but in general, it'd be painful and unreadable.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 27/11/2019 06:33, Vinnie Falco wrote:
I would much prefer static_string
Same here... I was foolish to rename it.
Neither one is a great choice, as both "static" and "fixed" can be synonyms of "immutable", which this is not. "fixed_capacity_string" is the most accurate name -- while you might think that's too long, I think it's better to be properly descriptive and allow consumers to typedef it to something easier to type if they wish to.
Regarding your deviations from std::string: operator+ has been omitted because you consider it's not possible to determine a reasonable return type; isn't it just size N+M ? Do you reject that just because it could end up rather large after a sequence of additions? I'd agree if it were N*M but not really for N+M. Have I missed something?
Yes. It is too easy for someone to change existing code that does a+b+c+d where the operands are strings. Users could the get surprising behavior of tens or hundreds of kilobytes of stack consumption. There are other ways to implement operator+ but we prefer to the omit interfaces that can have surprising behavior, or that have non-obvious results (such as truncating).
The most logical capacity for the result of operator+ would probably be std::max(a.capacity(), b.capacity()). Bear in mind that we are assuming that both actual string values are significantly shorter than their capacities throughout most of their lifetime. (This is a reasonably safe assumption especially when you *don't* have an operator"" that produces exactly-right-sized strings.) In an ideal world, you'd be able to factor in the current lengths, but of course that isn't available at compile time so can't be used to determine the type. Having said that, operator+ is still going to cause an allocation, so it does still make sense to omit it from the interface for that reason, so that people think about it more (usually by switching to zero-allocation append or +=).
You've chosen to return a string_view from substr(). I think that's a bit dangerous; code that does "auto a = s.substr(...)" could end up with a dangling view when s is a fixed string; this makes it more difficult than it should be to migrate from one string type to another, or to write generic code.
Again we have to refer to the purpose of the container. People are using this for performance, the very last thing they want from substr() is to receive a copy. Users can opt-in to making a copy if they want, by constructing a new static_string from the string view returned by substr. If we go with your suggestion, no one can opt out of copies.
Perhaps as a compromise (due to the difference from std::string interface anyway) this method should be named substr_view? This would then alert someone converting from one to the other to look at each call site and verify that it isn't going to produce a dangling view.
My initial thought was that static_string should be implemented on top of boost::container::static_vector<char>. My rationale...
No normal user is going at static_string and think "I would use this if it required Boost.Container." Fewer dependencies is better. And, this library is designed to work without the rest of Boost (the "standalone" mode). I want people to be able to do "vcpkg install static_string" and then be ready to boogie. They will still have to type `boost::` to access it.
That just means that standalone mode would need to depend on a standalone implementation of static_vector, which vcpkg would pull in as well. Or you should give up on a standalone mode and have your vcpkg depend on Boost. Dependencies can be annoying, but constantly reinventing existing vocabulary types is also annoying. Waiting until they get adopted by the standard before using them in other libraries is a bit silly.
Vinnie Falco wrote:
On Tue, Nov 26, 2019 at 8:31 AM Phil Endecott via Boost
wrote:
- Much more of it should be constexpr. Taking the opposite position, why should it be constexpr?
I think best practice is to declare everything constexpr unless it can't be. It's a bit like const was 20 years ago - does anyone else remember dealing with old code that didn't declare anything const?
operator+ has been omitted because you consider it's not possible to determine a reasonable return type; isn't it just size N+M ? Do you reject that just because it could end up rather large after a sequence of additions? I'd agree if it were N*M but not really for N+M. Have I missed something?
Yes. It is too easy for someone to change existing code that does a+b+c+d where the operands are strings. Users could the get surprising behavior of tens or hundreds of kilobytes of stack consumption. There are other ways to implement operator+ but we prefer to the omit interfaces that can have surprising behavior, or that have non-obvious results (such as truncating).
Your application seems to be for largish buffers; mine is for smallish strings. For the smallest strings a static_string can be smaller than a std::string, and for slightly larger strings the space overhead if the actual string is shorter than the capacity may be an acceptable trade-off for the improved performance. For example: struct NameAndAddress { static_string<14> firstname; static_string<14> surname; static_string<30> address_line_1; static_string<22> city; static_string<10> postcode; }; (Yes, yes, you probably wouldn't want to make those fields fixed-size in real code, but it's an example we can all understand.) That struct is trivially-copyable. Copying a version that used std::string would involve multiple memory allocations and could be an order of magnitude slower. Now: std::string a = s.firstname + " " + s.surname + ",\n" + s.address_line_1 + ",\n" + s.city + ",\n" + s.postal_code; That will create a temporary static_string<97> and copy from that into the std::string. But it needs a series of temporaries and probably uses a few hundred bytes of stack. Yes this isn't as good as we could get from a concat(...) function (or expression templates) that computed the size of the required buffer up-front and copied all of the arguments once, but it's still going to be an improvement over using std::string's operator+.
- You can add std::hash overloads.
I'm not seeing the value in these features. What's the use case? No one will be using static_string as the key for a container, since it wastes space.
using customer_code = static_string<11>;
std::unordered_map
You've chosen to return a string_view from substr(). I think that's a bit dangerous; code that does "auto a = s.substr(...)" could end up with a dangling view when s is a fixed string; this makes it more difficult than it should be to migrate from one string type to another, or to write generic code.
Again we have to refer to the purpose of the container.
Well again I'll contend that your purpose for the container is not the only one.
People are using this for performance, the very last thing they want from substr() is to receive a copy. Users can opt-in to making a copy if they want, by constructing a new static_string from the string view returned by substr. If we go with your suggestion, no one can opt out of copies.
If you want to get a string_view of a substring of a std::string, I think you can write something like std::string s = .....; std::string_view sv(s); // view of all of s std::string_view sub_sv = sv.substr(pos,count); // view of a subview of s Or in one line: auto sub_sv = std::string_view(s).substr(pos,count); Maybe std::string should have a method to do that more succinctly? Anyway, someone wanting to get a string_view of a substring of a fixed_string can do exactly the same thing.
My initial thought was that static_string should be implemented on top of boost::container::static_vector<char>. My rationale...
No normal user is going at static_string and think "I would use this if it required Boost.Container."
Vinnie, I have literally just said that I'd prefer this to be an improved static_vector and then an adaptor that adds all the string methods on top of static_vector<char>. I think that would be better because it gives us three useful things "for the price of one", i.e. a better static_vector, a static_string, and a "stringification" adaptor that can be used independently. So you're calling me "not normal" for suggesting that. I think that's rude. You've also been rude to Gonzalo Brito Gadeschi and the whole of WG21 in parts of your message that I've not quoted. Please don't do that. Unfortunately I don't have the time to respond to everything else. Regards, Phil.
On Wed, Nov 27, 2019 at 8:36 AM Phil Endecott via Boost
struct NameAndAddress { static_string<14> firstname; static_string<14> surname; static_string<30> address_line_1; static_string<22> city; static_string<10> postcode; };
Ahh...I can't argue with this. `sizeof(std::string)` is * 32 bytes on gcc 8.3.0 * 24 bytes on clang 9.0.0 This is a pretty compelling argument, I am convinced. Therefore, this library should provide specializations for optimizing small strings. This could include storing the current size as (C-N) in the last character position where C is the capacity and N is the size.
(Yes, yes, you probably wouldn't want to make those fields fixed-size in real code, but it's an example we can all understand.)
Actually.... small fixed size fields come up in database applications all the time. And with sizeof(std::string) being 32 bytes on clang, a specialization will allow a generous 31 characters which is actually quite nice.
...it's still going to be an improvement over using std::string's operator+.
Okay, I agree that we need to do _something_. I'm not sure exactly what that something is. I'm hesitant to endorse operator+ because of the unpredictable behavior. How about a free function `concat(...)` which allows the caller to optionally specify the maximum capacity of the resulting string,
Vinnie, are you saying that this (i.e. small strings that are trivially- copyable) is not a supported use for the library?'
\sigh. Yes, it needs hash and probably the other stuff.
auto sub_sv = std::string_view(s).substr(pos,count);
I can support this if there is consensus that substr() should return a fixed capacity string instead of a view.
Maybe std::string should have a method to do that more succinctly?
Umm... I think std::string has enough members :) it needs a rest.
My initial thought was that static_string should be implemented on top of boost::container::static_vector<char>. My rationale... ...
Vinnie, I have literally just said that I'd prefer this to be an improved static_vector
You said it should use boost::container::static_vector. That's a non-starter for me. A goal for the library is to not use any external dependencies. I am also not a fan of Boost.Container's idiom of Options template parameters. It confuses tooling and it is hard to teach. I don't mind it in Boost.Intrusive because intrusive containers are already a tool for experts. But I'd like the fixed capacity strings to be more accessible. There are also string-specific optimizations which wouldn't work if the implementation was tied to static_vector. No I don't think that coupling the two is a good idea at all. Regards
On Wed, Nov 27, 2019 at 11:23 AM Vinnie Falco via Boost < boost@lists.boost.org> wrote:
On Wed, Nov 27, 2019 at 8:36 AM Phil Endecott via Boost
...it's still going to be an improvement over using std::string's operator+.
Okay, I agree that we need to do _something_. I'm not sure exactly what that something is. I'm hesitant to endorse operator+ because of the unpredictable behavior. How about a free function `concat(...)` which allows the caller to optionally specify the maximum capacity of the resulting string,
I was thinking the same thing. Zach
On 28/11/2019 06:23, Vinnie Falco wrote:
On Wed, Nov 27, 2019 at 8:36 AM Phil Endecott wrote:
auto sub_sv = std::string_view(s).substr(pos,count);
I can support this if there is consensus that substr() should return a fixed capacity string instead of a view.
To clarify, I think the argument was that one can already obtain a string_view substr via the above, therefore your own member substr should probably just be removed (as it deviates from std::string semantics anyway).
Vinnie Falco wrote:
How about a free function `concat(...)` which allows the caller to optionally specify the maximum capacity of the resulting string
See also http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p1228r1.html IMO, that paper unnecessarily combines string concatenation with number-formatting, but it is still interesting to look at. It converts all of its arguments to string_views, sums the lengths (at run time), allocates a std::string of that length, and copies the arguments into it. You could extend that to (a) permit the return type to be specified as an additional template parameter, and (b) default that return type to a static_string if the capacities of all the arguments are finite. Regards, Phil.
Vinnie Falco wrote:
Okay, I agree that we need to do _something_. I'm not sure exactly what that something is. I'm hesitant to endorse operator+ because of the unpredictable behavior. How about a free function `concat(...)` which allows the caller to optionally specify the maximum capacity of the resulting string,
The function that allows you to specify the capacity is spelled fixed_string<N> result = s1 + s2; This is not particularly efficient on paper, it's one extra memcpy, but that's unavoidable when using op+. And, while I agree that op+ giving 1024 for two 512 inputs isn't very useful, op+ for inputs 12 and 8 giving 20 seems not that useless.
Em qua., 27 de nov. de 2019 às 13:36, Phil Endecott via Boost < boost@lists.boost.org> escreveu:
struct NameAndAddress { static_string<14> firstname; static_string<14> surname; static_string<30> address_line_1; static_string<22> city; static_string<10> postcode; };
I was going to write a review, but I think I can borrow your example to show the single point I'm concerned about. char[N] has alignment of char. std::size_t has not and structs like this will consume much more memory because this single member: https://github.com/18/fixed_string/blob/9730545e0fa37b43299559fca8253c434c67... I'm concerned about this alignment and I'd rather have the size being computed by searching for the NULL-terminator (or also stored in a byte). Maybe a policy template parameter could be used to select the implementation. I don't really care about which solution is chosen as long as I can opt-in for a struct which has the same alignment as char (and it is trivially copyable). -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/
On Wed, Nov 27, 2019, 10:17 AM Vinícius dos Santos Oliveira via Boost < boost@lists.boost.org> wrote:
I'm concerned about this alignment and I'd rather have the size being computed by searching for the NULL-terminator (or also stored in a byte).
I agree with char alignment, and we have techniques for doing this transparently in constant time, without policies, which will be impmemented.
Vin?cius dos Santos Oliveira
Em qua., 27 de nov. de 2019 ?s 13:36, Phil Endecott via Boost < boost@lists.boost.org> escreveu:
struct NameAndAddress { static_string<14> firstname; static_string<14> surname; static_string<30> address_line_1; static_string<22> city; static_string<10> postcode; };
I was going to write a review, but I think I can borrow your example to show the single point I'm concerned about.
char[N] has alignment of char. std::size_t has not and structs like this will consume much more memory because this single member
This is avoided if you use an appropriate smaller type for the size, i.e. uint8_t when capacity <= 255.
I'd rather have the size being computed by searching for the NULL-terminator
Yes a string that does this can also be useful, particularly for the very smallest strings, but I would not suggest trying to combine it with this implementation. Regards, Phil.
On Tue, Nov 26, 2019 at 11:34 AM Vinnie Falco via Boost < boost@lists.boost.org> wrote:
- You can add std::hash overloads.
I'm not seeing the value in these features. What's the use case? No one will be using static_string as the key for a container, since it wastes space.
I've used fixed sized strings not only because they are space efficient, but also because they are non-allocating. And because those types were being used in other places, it was also useful to have them as keys in hash maps.
Why do we need a string literal operator, to produce a string constant that takes up more space than necessary? And why are we "optimizing" the case where size==0? Is this something that anyone will notice or care about (aside from WG21 of course, which often concerns itself with not-relevant things).
I'm sure people said the same thing about array
wt., 26 lis 2019 o 18:34 Vinnie Falco via Boost
On Tue, Nov 26, 2019 at 8:31 AM Phil Endecott via Boost
wrote: I would much prefer static_string
Same here... I was foolish to rename it.
- Much more of it should be constexpr.
I agree that it could be constexpr (at least some of it) but this can be added as an improvement later.
Taking the opposite position, why should it be constexpr? The purpose of this container is to serve as a performant substitute for std::string when the maximum size is known ahead of time or can be reasonably estimated. In which use-cases do you anticipate the need for a constexpr static_string? It would certainly waste space, unless the caller sizes it perfectly. But then what's the difference between that and a constexpr string literal or string view?
In my opinion people are too quick to say "constexpr it" without evaluating the necessity. I'm not saying we shouldn't do it (but see below), however lets not pretend that there are legions of users eager for a constexpr fixed capacity mutable string.
There is also a question of whether all of the elements must be initialised (to 0) by the ctor to be constexpr, which could be costly for a large but sparse container, though
I think initializing everything to 0 is a non-starter for the non-constexpr case, due to the performance penalty. And initializing to 0 only for constexpr requires new language features I believe? std::is_constexpr_evaluated or some such.
it does mean that subsequent push_back()s don't need to store terminators;
It would still need to null-terminate, for the case where characters are removed from the string and subsequently reinserted.
- You can add a specialisation for size 0 that turns it into an empty struct.
- You can add a string literal operator"".
- You can add std::hash overloads.
I'm not seeing the value in these features. What's the use case? No one will be using static_string as the key for a container, since it wastes space. Why do we need a string literal operator, to produce a string constant that takes up more space than necessary? And why are we "optimizing" the case where size==0? Is this something that anyone will notice or care about (aside from WG21 of course, which often concerns itself with not-relevant things).
There are various possibilities for what should happen if the capacity is exceeded, including throwing an exception, asserting, and allowing undefined behaviour. I'm not sure what I would really prefer though I think I'd be more likely to assert() than to throw. See P0843r2 again which discusses this.
Quoting some nonsense from p0843r2:
1. Throw an exception. 2. Make this a precondition violation. ... It could throw either std::out_of_bounds or std::logic_error but in any case the interface does not end up being equal to that of std::vector. ... The alternative is to make not exceeding the capacity a precondition on the static_vector's methods.
So basically Gonzalo (the paper's author) says that throwing an exception of the wrong type makes the interface different from std::vector, thus the solution is to make the interface even more different from vector by requiring callers to check sizes. Gonzalo lists existing practice of Boost.Container's static_vector, with throws on overflow, as existing practice. Only in the bizarro world of WG21 could someone write a paper which quotes existing practice, then proceed to deviate from existing practice for illogical reasons.
We can return to sanity by recognizing what the purpose of static_string is for:
"...a performant substitute for std::string when the maximum size is known ahead of time or can be reasonably estimated."
Throwing an exception is the obvious rational choice. It lets static_string be mostly a drop-in replacement, without requiring call sites to add extensive code to check for preconditions. It guarantees that any successful modifications to the static_string will be exact (no truncation). And most importantly it matches existing practice (e.g. boost::container::static_vector and others like it).
Actually, boost::container::static_vector defines the container beyond its capacity as a precondition: https://www.boost.org/doc/libs/1_71_0/doc/html/boost/container/static_vector... And this does not require of a user any extensive "checking" of preconditions. Te user simply knows her program's flow, and already knows the preconditions are not exceeded. For instance, the logic of the application dictates that strings are never resized, or for every name initially created only zero or one push_back()s are performed with a single character. Regards, &rzej;
Regarding your deviations from std::string: operator+ has been omitted because you consider it's not possible to determine a reasonable return type; isn't it just size N+M ? Do you reject that just because it could end up rather large after a sequence of additions? I'd agree if it were N*M but not really for N+M. Have I missed something?
Yes. It is too easy for someone to change existing code that does a+b+c+d where the operands are strings. Users could the get surprising behavior of tens or hundreds of kilobytes of stack consumption. There are other ways to implement operator+ but we prefer to the omit interfaces that can have surprising behavior, or that have non-obvious results (such as truncating).
You've chosen to return a string_view from substr(). I think that's a bit dangerous; code that does "auto a = s.substr(...)" could end up with a dangling view when s is a fixed string; this makes it more difficult than it should be to migrate from one string type to another, or to write generic code.
Again we have to refer to the purpose of the container. People are using this for performance, the very last thing they want from substr() is to receive a copy. Users can opt-in to making a copy if they want, by constructing a new static_string from the string view returned by substr. If we go with your suggestion, no one can opt out of copies.
My initial thought was that static_string should be implemented on top of boost::container::static_vector<char>. My rationale...
No normal user is going at static_string and think "I would use this if it required Boost.Container." Fewer dependencies is better. And, this library is designed to work without the rest of Boost (the "standalone" mode). I want people to be able to do "vcpkg install static_string" and then be ready to boogie. They will still have to type `boost::` to access it.
Regards
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
wt., 26 lis 2019 o 18:34 Vinnie Falco via Boost
On Tue, Nov 26, 2019 at 8:31 AM Phil Endecott via Boost
wrote: You've chosen to return a string_view from substr(). I think that's a bit dangerous; code that does "auto a = s.substr(...)" could end up with a dangling view when s is a fixed string; this makes it more difficult than it should be to migrate from one string type to another, or to write generic code.
Again we have to refer to the purpose of the container. People are using this for performance, the very last thing they want from substr() is to receive a copy. Users can opt-in to making a copy if they want, by constructing a new static_string from the string view returned by substr. If we go with your suggestion, no one can opt out of copies.
Even if this is a valid design choice when considered in separation, it is clearly in conflict with the goal of being a drop-in replacement for std::string. It looks like the library is aiming at being a drop-in replacement for some subset of usages of std::string. It would be beneficial to outline this subset clearly in the initial sections of the documentation. Regards, &rzej;
On Mon, Dec 2, 2019 at 3:07 AM Andrzej Krzemienski via Boost < boost@lists.boost.org> wrote:
wt., 26 lis 2019 o 18:34 Vinnie Falco via Boost
napisał(a): On Tue, Nov 26, 2019 at 8:31 AM Phil Endecott via Boost
wrote: You've chosen to return a string_view from substr(). I think that's a bit dangerous; code that does "auto a = s.substr(...)" could end up with a dangling view when s is a fixed string; this makes it more difficult than it should be to migrate from one string type to another, or to write generic code.
Again we have to refer to the purpose of the container. People are using this for performance, the very last thing they want from substr() is to receive a copy. Users can opt-in to making a copy if they want, by constructing a new static_string from the string view returned by substr. If we go with your suggestion, no one can opt out of copies.
Even if this is a valid design choice when considered in separation, it is clearly in conflict with the goal of being a drop-in replacement for std::string. It looks like the library is aiming at being a drop-in replacement for some subset of usages of std::string. It would be beneficial to outline this subset clearly in the initial sections of the documentation.
+1. FWIW, I find "this is a drop-in replacement" to be a typically unattainable goal for anything that is not based on some kind of polymorphism (compile- or runtime). That is, the polymorphic drop-in mechanism only cares about a subset of API and/or behavior. Zach
On Mon, Dec 2, 2019 at 11:57 AM Zach Laine via Boost
+1. FWIW, I find "this is a drop-in replacement" to be a typically unattainable goal for anything that is not based on some kind of polymorphism (compile- or runtime). That is, the polymorphic drop-in mechanism only cares about a subset of API and/or behavior.
Perhaps we provide two separate functions like was mentioned previously, where one returns a fixed_string and the other returns a string_view.
On Mon, Dec 2, 2019 at 12:13 PM Krystian Stasiowski via Boost < boost@lists.boost.org> wrote:
On Mon, Dec 2, 2019 at 11:57 AM Zach Laine via Boost < boost@lists.boost.org> wrote:
+1. FWIW, I find "this is a drop-in replacement" to be a typically unattainable goal for anything that is not based on some kind of polymorphism (compile- or runtime). That is, the polymorphic drop-in mechanism only cares about a subset of API and/or behavior.
Perhaps we provide two separate functions like was mentioned previously, where one returns a fixed_string and the other returns a string_view.
Perhaps. I think you were on the right track changing substr() to return a string_view, just not to worry too much about being a drop-in replacement. I think you should only worry about API compatibility where it makes sense. Zach
Krystian Stasiowski wrote:
Perhaps we provide two separate functions like was mentioned previously, where one returns a fixed_string and the other returns a string_view.
I think that this is almost certainly the right thing to do, both for fixed_string and for std::string. Call the string_view returning function `s.subview()` or just `s.view()`, leave `s.substr()` to return by value.
On Mon, Dec 2, 2019 at 1:57 PM Peter Dimov via Boost
Krystian Stasiowski wrote:
Perhaps we provide two separate functions like was mentioned previously, where one returns a fixed_string and the other returns a string_view.
I think that this is almost certainly the right thing to do, both for fixed_string and for std::string. Call the string_view returning function `s.subview()` or just `s.view()`, leave `s.substr()` to return by value.
I like the ring of subview
participants (9)
-
Andrzej Krzemienski
-
Gavin Lambert
-
Krystian Stasiowski
-
Nevin Liber
-
Peter Dimov
-
Phil Endecott
-
Vinnie Falco
-
Vinícius dos Santos Oliveira
-
Zach Laine