Boost.StaticString (formerly Boost.FixedString) is ready for release!

newer
Fwd: Boost.StaticString (formerly...

older
Projects for GSoC 2020

Krystian Stasiowski

24 Feb 2020 24 Feb '20

12:32 a.m.

Hi all, After some post-review polishing, Boost.StaticString is good to go for release! I have been in touch with the review manager, and everything has been checked over. Repo: https://github.com/sdkrystian/static_string Docs: https://sdkrystian.github.io/doc/static_string Here are some things that have been changed post-review: - Library has been renamed from Boost.FixedString to Boost.StaticString. Class template fixed_string has been renamed to basic_static_string, and aliases have been added for common character types, similar to std::string. - constexpr support to the fullest extent supported by available compilers has been added. - Functions that will not throw have been marked noexcept. - Moved to a single header configuration. - std::hash and boost::hash_value support has been added. - It has been decided that functions will throw on capacity overflow. - A specialization for when the capacity is 0 has been added that has no member. Additionally, the type of the member used to store the size of the string is now statically chosen to be the smallest possible width unsigned standard integer type that can hold all the needed size values. - operator+ has been implemented. - Floating point conversion support has been added to to_static_string/to_static_wstring. - Lots of bugs fixed. More behind the scenes, a bunch of optimizations have been added, taking advantage of the fact that the capacity is part of the type and known at compile time. Again, thank you all who participated in the review, and thank you Joaquin for managing it! Regards, Krystian Stasiowski

Show replies by date

Vinnie Falco

24 Feb 24 Feb

12:47 a.m.

Hear hear! On Sun, Feb 23, 2020 at 4:33 PM Krystian Stasiowski via Boost <boost@lists.boost.org> wrote:

...

Hi all,

After some post-review polishing, Boost.StaticString is good to go for release! I have been in touch with the review manager, and everything has been checked over.

Repo: https://github.com/sdkrystian/static_string Docs: https://sdkrystian.github.io/doc/static_string

Here are some things that have been changed post-review:

- Library has been renamed from Boost.FixedString to Boost.StaticString. Class template fixed_string has been renamed to basic_static_string, and aliases have been added for common character types, similar to std::string. - constexpr support to the fullest extent supported by available compilers has been added. - Functions that will not throw have been marked noexcept. - Moved to a single header configuration. - std::hash and boost::hash_value support has been added. - It has been decided that functions will throw on capacity overflow. - A specialization for when the capacity is 0 has been added that has no member. Additionally, the type of the member used to store the size of the string is now statically chosen to be the smallest possible width unsigned standard integer type that can hold all the needed size values. - operator+ has been implemented. - Floating point conversion support has been added to to_static_string/to_static_wstring. - Lots of bugs fixed.

More behind the scenes, a bunch of optimizations have been added, taking advantage of the fact that the capacity is part of the type and known at compile time.

Again, thank you all who participated in the review, and thank you Joaquin for managing it!

Regards, Krystian Stasiowski

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Regards, Vinnie Follow me on GitHub: https://github.com/vinniefalco

Mateusz Loskot

7:50 a.m.

On 24 February 2020 01:32:52 CET, Krystian Stasiowski via Boost <boost@lists.boost.org> wrote:

...

Hi all,

After some post-review polishing, Boost.StaticString is good to go for release!

Congratulations! I think you xan request admins for the boostorg repo via https://github.com/boostorg/admin Best regards, -- Mateusz Loskot, mateusz@loskot.net (Sent from K-9 on mobile, may suffer from top-posting)

Krystian Stasiowski

1:27 p.m.

-------- Original message --------From: Mateusz Loskot via Boost <boost@lists.boost.org> Date: 2/24/20 02:50 (GMT-05:00) To: boost@lists.boost.org Cc: Mateusz Loskot <mateusz@loskot.net> Subject: Re: [boost] Boost.StaticString (formerly Boost.FixedString) is ready for release!> Congratulations!Thank you!> I think you xan request admins for the boostorg repo via https://github.com/boostorg/adminI'm not fully familiar with that terminology... I'm assuming I should open an issue requesting to transfer ownership of the repo, and get admin access to it?Thanks,Krystian Stasiowski

Mateusz Loskot

1:52 p.m.

On Mon, 24 Feb 2020 at 14:27, Krystian Stasiowski via Boost <boost@lists.boost.org> wrote:

...

I'm assuming I should open an issue requesting to transfer ownership of the repo, and get admin access to it?

Yes, AFAIU, the boostorg/admin is the right place to spam our mighty admins with wishes and requests :) Best regards, -- Mateusz Loskot, http://mateusz.loskot.net

Glen Fernandes

3:32 p.m.

On Sunday, February 23, 2020, Krystian Stasiowski via Boost < boost@lists.boost.org> wrote:

...

Hi all,

After some post-review polishing, Boost.StaticString is good to go for release! I have been in touch with the review manager, and everything has been checked over.

Once the review manager posts his final acceptance based on the conditions he set to the mailing list ( and the review wizard has no objections) you can transfer ownership of the repository to myself or Peter or another admin and we can move it into boostorg. Alternatively if you do not care retaining existing issues, PRs, &c I can create a repository in boostorg. Glen

degski

26 Feb 26 Feb

3:32 p.m.

On Sun, 23 Feb 2020 at 18:33, Krystian Stasiowski via Boost < boost@lists.boost.org> wrote:

...

Additionally, the type of the member used to store the size of the string is now statically chosen to be the smallest possible width unsigned standard integer type that can hold all the needed size values.

I guess I'm too late (sitting in a bus, not watching the list), but I would (like to throw in that I would like to) have a template-SizeType-parameter and leave the signed- or unsigned-ness in the middle (while making sure things work correctly for both types (not hard)). This avoids the 'should we have ssize ( ), which seems to go against the current grain (but is recognized as being problematic), but it does benefit from being able to benefit from better optimizations due to the use of signed size-types. degski -- @realdegski https://brave.com/google-gdpr-workaround/ "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey

Krystian Stasiowski

27 Feb 27 Feb

2:56 p.m.

On Wed, Feb 26, 2020 at 10:33 AM degski via Boost <boost@lists.boost.org> wrote:

...

I guess I'm too late (sitting in a bus, not watching the list), but I would (like to throw in that I would like to) have a template-SizeType-parameter and leave the signed- or unsigned-ness in the middle (while making sure things work correctly for both types (not hard)). This avoids the 'should we have ssize ( ), which seems to go against the current grain (but is recognized as being problematic), but it does benefit from being able to benefit from better optimizations due to the use of signed size-types.

Any benefit that would be gained from this would be marginal, and the interface would suffer from another template parameter that 99% of the user base would never touch. Perhaps it would be best to leave this unimplemented.

degski

28 Feb 28 Feb

3:08 p.m.

On Thu, 27 Feb 2020 at 08:57, Krystian Stasiowski via Boost < boost@lists.boost.org> wrote:

...

Any benefit that would be gained from this would be marginal, and the interface would suffer from

Yes, maybe (did you measure ?), but it does away with the 'comparing signed to unsigned' and the UB on signed overflow, does allow for optimizations.' BUT, obviously if you write everything using std::size_t you won't see that, and casting won't do that either. Suppose you need to store some (or many of those indexes (of type std::size_t), the better cache-locality (and lower memory use), will affect your performance. There are certainly more use-cases. std::span almost had a ssize() param, but in the end (I believe) holding on to the past seems more important. Iff we now start implementing classes as I propose (with SizeType), we might over time get to a stage where more devs are getting comfortable with int as a size_type. I have never in my life seen a vector of size 2^32, even an array of chars that size is huge. The STL-solution to use std::size_t is totally arbitrary (and does not address the problem in principle, just in (all imaginable cases) in practice) and (as usual with the STL) severe overshoot of solving the problem. So using int's is not worse than using std::size_t. On virtual memory (where one does have to deal with std::size_t's) one w/could use offset-pointers (they are builtin-in in VC to this purpose, the so-called base-pointers, with the keyword '_base', undoubtedly gcc/clang supports the same thing (maybe a different key-word, I don't know) and clang-cl, certainly supports it) and then also that problem can be reduced to an int-problem) people, there is always std::int64_t, with a max of 2^63, which is so large we can easily say that those arrays (> 2^63 ) will never be needed. With std::size_t's we easily introduce UB (and a bad one for that matter, because the wrapping of un-signed's might get unnoticed (luckily there is a warning, but nothing stops you from ignoring it)). f this does not convince you, let me throw in a fact. The number of sand-grains on earth is estimated to be around 7.5 ^ 10, which is '‭0110100000010101010110100100001101100111011011100000000000000000‬' in binary, the size of an array of shorts is larger than that number (in bytes), so let's turn all sand on earth into one giant optane-chip (just look friendly at Intel, they already manage to do 32GB, and according to the STL, getting to 2^64 is a doddle) and get calculating with the STL's std::size_t, for a similar array of int's we'll just ship in the silicon from the moon and beyond (yes, I do know that you won't need a grain of sand per byte, but it begs the question what an ordinary program needs std::size_t's for. Such large numbers are mostly good for counting stars and counting sand-grains, but one would do that with doubles any way, because they're guesses and hence no std::size_t's are required). To summarize: 'it makes no sense' in my view. degski PS1: you'll need std::size_t for labeling every sand-grain individually (would be nice, we'll know exactly which sand-grain we mean. We'll need to get a lot of ink to write is on them, though.) PS2: I should have given a sarcasm warning, yes, I'm p-eed of, the answer here (boost-dev-list) to any request is always a big no-way-jose, even when it concerns something as easily implementable as the above. -- @realdegski https://brave.com/google-gdpr-workaround/ "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey

Alexander Grund

3:47 p.m.

TLDR: I'm contra signed-sizes. The size type is size_t and it's unsigned for a reason: There exists no negative size. If you get an unsigned value there is no need to check for below zero, if you get a signed value you might. It is the same there is `not_null<T>` in GSL(?). The whole discussion just shows that there is a problem with operations mixing signed and unsigned types in C++ in general. What we probably wanted was something like `size_t = not_negative<int>`, but well... Am 28.02.20 um 16:08 schrieb degski via Boost:

...

Yes, maybe (did you measure ?), but it does away with the 'comparing signed to unsigned' and the UB on signed overflow, does allow for optimizations.'

...

Suppose you need to store some (or many of those indexes (of type std::size_t), the better cache-locality (and lower memory use), will affect your performance. You mean using a 16- or 32-bit type as the size? Well there might be use cases for >4GB of "strings", although that's conceived. But curious: What usecase requires you to store that many indices that

I can pass that back: Did you measure the benefit of those UB-based optimizations? this matters?

...

There are certainly more use-cases. std::span almost had a ssize() param, but in the end (I believe) holding on to the past seems more important. I think it's more about consistency. But yes, due to the rules for mixing signed/unsigned ops this would have helped. I have never in my life seen a vector of size 2^32, even an array of chars that size is huge. The STL-solution to use std::size_t is totally arbitrary Is it? size_t used to be 32 bits unsigned. That is the most reasonable choice. The next better would be 16bit but I'm sure we agree that this would be to small. And in absence of a not_negative<int> a size is unsigned as is the underlying value space. there is always std::int64_t Not necessarily. IIRC it is optional With std::size_t's we easily introduce UB (and a bad one for that matter, because the wrapping of un-signed's might get unnoticed (luckily there is a warning, but nothing stops you from ignoring it)). With ranges and range-based for loops the operations on a size become less frequent. And it is easier to teach: Requesting a size? You'll get an unsigned value because it can never be negative. It avoids ambiguity. Compare that to a function returning a pointer: Are you supposed to check that for NULL or can you assume it never is?

degski

8:15 p.m.

On Fri, 28 Feb 2020 at 17:48, Alexander Grund via Boost < boost@lists.boost.org> wrote:

...

There exists no negative size.

This IS exactly why you should use signed int's, so overflow is easy to detect. By the time you (I don't mean You of course, sorry for my bad english) wrapped almost (a little bit less than 2^64) around the std::size_t integer line, you have no way to figure out if the number is correct or not (it could be very close, in the right direction and still be wrong, that's where the problem is), you just don't know, possibly it wrapped (several times maybe). Finding bugs related to this is hard, using int's you'll know right away.

...

If you get an unsigned value there is no need to check for below zero, if you get a signed value you might. It is the same there is `not_null<T>` in GSL(?).

But you would need to check if it wrapped and if you start adding and subtracting these things (or use Robert's library, but I guess, it does not all come for free, that safety), you'll need to find in advance whether it's going to wrap or not (if you do things properly), after the event you're just staring at a number which in the real world does not mean much. Nothing wraps in the real world, if things keep growing (like the world-population) at some point it will say kaboom (the int's should be sized to requirement of course like OP has implemented, I applaud this, now it only needs to also be possible to make that signed). The whole discussion just shows that there is a problem with operations

...

mixing signed and unsigned types in C++ in general.

Yes, solution: signed, slightly (a power of 2, :-) ) smaller, but that is not relevant for actual 'problems' in this (our) world. What we probably wanted was something like `size_t = not_negative<int>`,

...

but well...

I don't understand. degski -- @realdegski https://brave.com/google-gdpr-workaround/ "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey

Alexander Grund

2 Mar 2 Mar

9:16 a.m.

...

On Fri, 28 Feb 2020 at 17:48, Alexander Grund via Boost < boost@lists.boost.org> wrote:

...
There exists no negative size. This IS exactly why you should use signed int's So to represent a number that is always unsigned you should use a signed type? , so overflow is easy to detect. That is a different use case: An operation, not a representation. Nothing stops you from converting to signed, do what you want including your "check for negative"-based overflow detection and finally converting back to unsigned when passing it to an API expecting an unsigned number (number not type, although the type is unsigned too, see above) By the time you (I don't mean You of course, sorry for my bad english) wrapped almost (a little bit less than 2^64) around the std::size_t integer line Didn't you argue in the mail before that there will never be anything of size 2^32 and hence even not anything like 2^64? How could you overflow

Am 28.02.20 um 21:15 schrieb degski via Boost: that then?

...

Finding bugs related to this is hard, using int's you'll know right away. How? Only if you underflow. On unsigned you'll get a very large number if you go below zero, on signed you get a negative number. Both can be detected. But you talked about overflow. For unsigned you'll get a small number (that is wrong obviously but you COULD check) but for signed you get UB and hence can't even check for that.

...
If you get an unsigned value there is no need to check for below zero, if you get a signed value you might. It is the same there is `not_null<T>` in GSL(?).

But you would need to check if it wrapped No. If you call `obj.size()` you get an unsigned value that is a valid unsigned value. It cannot wrap when returning (conditions apply). If obj.size() returns a signed value you'll got to check for <0 before using the value unless the API somehow promises to not return negative values. Encoding this in the type is the natural thing to do. and if you start adding and subtracting these things See above. For a representation (and hence API) unsigned makes sense. For using arbitrary operations it may not. Use the types that fir your use case. Adding is save for unsigned, as you argued: The type is wide enough for all uses as a size of something. Subtraction might not but you can check first (`if(a <= size) return size - a; else throw `) (the int's should be sized to requirement of course like OP has implemented, I applaud this, now it only needs to also be possible to make that signed). You mean different widths? You'll still have to check if you can "downcast/narrowcast" it before doing so.

The whole discussion just shows that there is a problem with operations

...
mixing signed and unsigned types in C++ in general.

Yes, solution: signed, slightly (a power of 2, :-) ) smaller, but that is not relevant for actual 'problems' in this (our) world. See above: Use signed for operations if you want to detect underflow.

...
What we probably wanted was something like `size_t = not_negative<int>`, but well...

I don't understand.

Maybe after the above? You want a guarantee to have a non-negative number. "unsigned" is that but it suffers from underflow going undetected. A `not_null<T>` like "wrapper" which otherwise behaves as T but guarantees the non-negativity would make the type suitable for representing an unsigned number in a signed type suitable for operations. Obviously if you subtract something from a `not_negative<int>` it will become a plain "int". Once you pass it to an API expecting a `not_negative<int>` the precondition will be checked.

degski

2:52 p.m.

On Mon, 2 Mar 2020 at 03:16, Alexander Grund via Boost < boost@lists.boost.org> wrote:

...

Nothing stops you from converting to signed, do what you want including your "check for negative"-based overflow detection and finally converting back to unsigned when passing it to an API expecting an unsigned number (number not type, although the type is unsigned too, see above)

Could you please try and explain, why you think that signed is not a good type for a size (other than stating that "size cannot be negative"), what I am saying is that "valid-size cannot be negative"? You could also make the size a complex number, that would be an analogue. The imaginary-componant will have to be zero, but otherwise it would work just fine. The fact that that set is larger than the problem domain is IMO orthogonal to that.

...

Didn't you argue in the mail before that there will never be anything of size 2^32 and hence even not anything like 2^64? How could you overflow that then?

If you are manipulating (subtracting, adding differences) pointers, I thought I wrote that. The pointers might be pointers into Virtual Memory and can have any value [1, 2^56].

...

Finding bugs related to this is hard, using int's you'll know right away. How? Only if you underflow. On unsigned you'll get a very large number if you go below zero, on signed you get a negative number. Both can be detected. But you talked about overflow. For unsigned you'll get a small number (that is wrong obviously but you COULD check) but for signed you get UB and hence can't even check for that.

...
...
If you get an unsigned value there is no need to check for below zero, if you get a signed value you might. It is the same there is `not_null<T>` in GSL(?).

But you would need to check if it wrapped No. If you call `obj.size()` you get an unsigned value that is a valid unsigned value. It cannot wrap when returning (conditions apply).

Tautology: "... an unsigned value that is a valid unsigned value", they always are, whether it's the right number is another question. If obj.size() returns a signed value you'll got to check for <0 before

...

using the value unless the API somehow promises to not return negative values. Encoding this in the type is the natural thing to do.

No, it's not, unsigned types are good for bit-manipulation only, nothing else. Unsigned types don't follow the rules of mathematics, they are fundamentally flawed by nature. The fact that int's are limited in range is not flaw, but an implementation detail. The mathematically correct way of doing things (on a Turing-machine) is to use signed big-ints.

...

and if you start adding and subtracting these things See above. For a representation (and hence API) unsigned makes sense. For using arbitrary operations it may not. Use the types that fir your use case. Adding is save for unsigned, as you argued: The type is wide enough for all uses as a size of something. Subtraction might not but you can check first (`if(a <= size) return size - a; else throw `)

You've now just precluded the use of noexcept (noexcept move f.e., super-important in modern C++) and added a branch (cannot be removed by something clever, exactly because it is unsigned, the compiler can make no assumptions and the code has to go through the math) to your code, all that because it upsets you that something that should not occur in the first place in correct code can occur iff one is writing code that now (as one observed it got negative) is known to have a bug. What is much better is to use signed int's combined with assert's. The use of unsigned is false security (actually no security) and serves nothing. In the end, you still need to write correct code (so the signed int's WON'T BE negative, there where they shouldn't be), but this practice makes you're code less flexible, more verbose (the unavoidable casts add to that) and probably slower than using signed. All that because of this 'natural' way of looking at sizes.

...

...
I don't understand.

Maybe after the above? You want a guarantee to have a non-negative number. "unsigned" is that but it suffers from underflow going undetected. A `not_null<T>` like "wrapper" which otherwise behaves as T but guarantees the non-negativity would make the type suitable for representing an unsigned number in a signed type suitable for operations. Obviously if you subtract something from a `not_negative<int>` it will become a plain "int". Once you pass it to an API expecting a `not_negative<int>` the precondition will be checked.

Got it now, yeah that would be great, but for now that would be run-time, no? And I guess, due to the halting problem, it can never be compile-time, unless it's a limited problem. degski -- @systemdeg "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey

Alexander Grund

3:57 p.m.

...

Could you please try and explain, why you think that signed is not a good type for a size (other than stating that "size cannot be negative"), what I am saying is that "valid-size cannot be negative"? You could also make the size a complex number, that would be an analogue. The imaginary-componant will have to be zero, but otherwise it would work just fine. The fact that that set is larger than the problem domain is IMO orthogonal to that.

Because if you use a signed type you have no compile-time guarantee that the value is unsigned (using "type" and "value" to differentiate those 2). Same as with a pointer: It can be NULL. If you want an interface where you want to guarantee at compile-time that the value passed over an API is never NULL, you use e.g. a reference that can never be NULL (or not_null<T>)

...

Didn't you argue in the mail before that there will never be anything of size 2^32 and hence even not anything like 2^64? How could you overflow that then?

If you are manipulating (subtracting, adding differences) pointers, I thought I wrote that. The pointers might be pointers into Virtual Memory and can have any value [1, 2^56].

Not sure I understand that. Can we agree that a 64-bit unsigned type is big enough to store any size of any container and hence no overflow is possible?

...

> Finding bugs related to this is hard, using int's you'll know right away. How? Only if you underflow. On unsigned you'll get a very large number if you go below zero, on signed you get a negative number. Both can be detected. But you talked about overflow. For unsigned you'll get a small number (that is wrong obviously but you COULD check) but for signed you get UB and hence can't even check for that. >> If you get an unsigned >> value there is no need to check for below zero, if you get a signed >> value you might. It is the same there is `not_null<T>` in GSL(?). >> > But you would need to check if it wrapped No. If you call `obj.size()` you get an unsigned value that is a valid unsigned value. It cannot wrap when returning (conditions apply).

Tautology: "... an unsigned value that is a valid unsigned value", they always are, whether it's the right number is another question.

Ok: "you get an unsigned type that is a valid unsigned value". If the size was signed you get a signed type which may be an unsigned value. You'll have to check.

...

If obj.size() returns a signed value you'll got to check for <0 before using the value unless the API somehow promises to not return negative values. Encoding this in the type is the natural thing to do.

No, it's not, unsigned types are good for bit-manipulation only, nothing else. Unsigned types don't follow the rules of mathematics, they are fundamentally flawed by nature. The fact that int's are limited in range is not flaw, but an implementation detail. The mathematically correct way of doing things (on a Turing-machine) is to use signed big-ints.

I disagree. And as mentioned you can do things like `int difference = int(obj.size()) - 1` anytime you want to do operations that are not fully defined on unsigned types (as in: may result in values outside the range) same as you can't to `int foo = sqrt(integer)` because you may get an imaginary number (if sqrt could do that, but I think you get the gist).

...

Adding is save for unsigned, as you argued: The type is wide enough for all uses as a size of something. Subtraction might not but you can check first (`if(a <= size) return size - a; else throw `)

You've now just precluded the use of noexcept (noexcept move f.e., super-important in modern C++) and added a branch (cannot be removed by something clever, exactly because it is unsigned, the compiler can make no assumptions and the code has to go through the math) to your code,

What is much better is to use signed int's combined with assert's.

...

all that because it upsets you that something that should not occur in the first place in correct code can occur iff one is writing code

...

The use of unsigned is false security (actually no security) and serves nothing. In the end, you still need to write correct code (so the signed int's WON'T BE negative, there where they shouldn't be), but this practice makes you're code less flexible, more verbose (the unavoidable casts add to that) and probably slower than using signed. All that because of this 'natural' way of looking at sizes. It serves as a contract on the API level: "This value is unsigned. Period." If the type was signed you'd need something else to enforce

How is that any different from `assert(a <= size); return size - a;`? that now (as one observed it got negative) is known to have a bug. Again: How is that different to using a signed type for "size"? You have exactly the same potential for bugs. You always have to make sure you stay in your valid domain, and a negative size is outside of that valid domain. Hence you got to check somewhere or use control flow to make sure this doesn't happen. So no difference in signed vs unsigned size regarding that that the value is unsigned. So yes you still need to write correct code and passing a negative value to an API expecting an unsigned value is in any case a bug.

...

You want a guarantee to have a non-negative number. "unsigned" is that but it suffers from underflow going undetected. A `not_null<T>` like "wrapper" which otherwise behaves as T but guarantees the non-negativity would make the type suitable for representing an unsigned number in a signed type suitable for operations. Obviously if you subtract something from a `not_negative<int>` it will become a plain "int". Once you pass it to an API expecting a `not_negative<int>` the precondition will be checked.

Got it now, yeah that would be great, but for now that would be run-time, no? And I guess, due to the halting problem, it can never be compile-time, unless it's a limited problem.

Surely at runtime. How else could you guarantee that your value isn't negative after you subtract something from it? It can be compile-time if you only add something to it and ignore overflow but you already do that when using signed values anyway. But all this arguing doesn't solve much: What piece of code would actually benefit from having a signed size? And not only the part where you request the size and use it, but also the part where you give that size back to the object, so you'll need to ensure an unsigned value. And yes `for(int i=0; i<int(obj.size())-1; i++)` is a known example that would allow to get rid of the cast if the size was signed. But again: That is due to operation used. `for(unsigned i=0; i+1<obj.size(); i++)` is perfectly valid assuming no overflow

Gavin Lambert

11:10 p.m.

On 3/03/2020 03:52, degski wrote:

...

Could you please try and explain, why you think that signed is not a good type for a size (other than stating that "size cannot be negative"), what I am saying is that "valid-size cannot be negative"? You could also make the size a complex number, that would be an analogue. The imaginary-componant will have to be zero, but otherwise it would work just fine. The fact that that set is larger than the problem domain is IMO orthogonal to that.

Using a signed type for a size introduces more potential for bugs than using an unsigned type. In addition to what Alexander said about the intent (negative values are never valid sizes, by definition), there are a few other reasons why signed sizes are a bad idea: 1. if you do range checking, you now have to do "index < 0 || index

...

= size()" instead of only "index >= size()". This is both more work and easily forgotten, which can introduce bugs.

2. if you actually do end up wrapping the type range somehow, in unsigned values this is well defined while in signed values this is UB. Compilers react increasingly poorly to UB, in many weird ways, so it's a bad idea to increase the probability of it occurring. There is exactly one reason why a signed size type is better: if you are doing subtraction of indexes for any reason, it is usually more convenient to deal with getting a -1 than getting a max size_t. However it's easy to recognise that you've hit that case and to cast explicitly yourself to a signed type and back as needed (with appropriate sanity checking). This is also usually free, as it's simply a reinterpretation of an existing bit pattern without any actual change to the bit pattern, or just using a different assembly instruction. Yes, there are some kinds of code (notably std::string::substr) that might be less surprising if they used signed indexing, because they tend to be involved in index subtraction and can end up doing the wrong thing if you don't externally check for improper conditions. But as usual, C++ aims for performance by default and trusts you to do any necessary sanity checks externally, or to omit checks if you think you know better. BUT: if you really want signed types in the interface, nothing stops you from wrapping the standard type in your own type that uses signed indexing. (I actually use this technique a fair bit when I want to have vectors and arrays that are indexed by enums or by typesafe-integers, or model an index range that doesn't start at 0.) And the compiler will usually inline everything for free anyway.

degski

28 Feb 28 Feb

8:31 p.m.

Mistakenly sent privately to Vinnie (some software must have a bug, coz the same action [on my end] causes the e-mail to sometimes go privately, while others' like replying to Alexander go through the mailing list, dunno why). On Fri, 28 Feb 2020 at 17:08, degski <degski@gmail.com> wrote:

...

On Thu, 27 Feb 2020 at 08:57, Krystian Stasiowski via Boost < boost@lists.boost.org> wrote:

...
Any benefit that would be gained from this would be marginal, and the interface would suffer from

Yes, maybe (did you measure ?), but it does away with the 'comparing signed to unsigned' and the UB on signed overflow, does allow for optimizations.' BUT, obviously if you write everything using std::size_t you won't see that, and casting won't do that either. Suppose you need to store some (or many of those indexes (of type std::size_t), the better cache-locality (and lower memory use), will affect your performance. There are certainly more use-cases. std::span almost had a ssize() param, but in the end (I believe) holding on to the past seems more important. Iff we now start implementing classes as I propose (with SizeType), we might over time get to a stage where more devs are getting comfortable with int as a size_type.

I have never in my life seen a vector of size 2^32, even an array of chars that size is huge. The STL-solution to use std::size_t is totally arbitrary (and does not address the problem in principle, just in (all imaginable cases) in practice) and (as usual with the STL) severe overshoot of solving the problem. So using int's is not worse than using std::size_t. On virtual memory (where one does have to deal with std::size_t's) one w/could use offset-pointers (they are builtin-in in VC to this purpose, the so-called base-pointers, with the keyword '_base', undoubtedly gcc/clang supports the same thing (maybe a different key-word, I don't know) and clang-cl, certainly supports it) and then also that problem can be reduced to an int-problem) people, there is always std::int64_t, with a max of 2^63, which is so large we can easily say that those arrays (> 2^63 ) will never be needed. With std::size_t's we easily introduce UB (and a bad one for that matter, because the wrapping of un-signed's might get unnoticed (luckily there is a warning, but nothing stops you from ignoring it)).

f this does not convince you, let me throw in a fact. The number of sand-grains on earth is estimated to be around 7.5 ^ 10, which is '‭0110100000010101010110100100001101100111011011100000000000000000‬' in binary, the size of an array of shorts is larger than that number (in bytes), so let's turn all sand on earth into one giant optane-chip (just look friendly at Intel, they already manage to do 32GB, and according to the STL, getting to 2^64 is a doddle) and get calculating with the STL's std::size_t, for a similar array of int's we'll just ship in the silicon from the moon and beyond (yes, I do know that you won't need a grain of sand per byte, but it begs the question what an ordinary program needs std::size_t's for. Such large numbers are mostly good for counting stars and counting sand-grains, but one would do that with doubles any way, because they're guesses and hence no std::size_t's are required). To summarize: 'it makes no sense' in my view.

degski PS1: you'll need std::size_t for labeling every sand-grain individually (would be nice, we'll know exactly which sand-grain we mean. We'll need to get a lot of ink to write is on them, though.) PS2: I should have given a sarcasm warning, yes, I'm p-eed of, the answer here (boost-dev-list) to any request is always a big no-way-jose, even when it concerns something as easily implementable as the above. -- @realdegski https://brave.com/google-gdpr-workaround/ "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey

-- @realdegski https://brave.com/google-gdpr-workaround/ "We value your privacy, click here!" Sod off! - degski "Anyone who believes that exponential growth can go on forever in a finite world is either a madman or an economist" - Kenneth E. Boulding "Growth for the sake of growth is the ideology of the cancer cell" - Edward P. Abbey

Vinnie Falco

3:44 p.m.

On Wed, Feb 26, 2020 at 7:33 AM degski via Boost <boost@lists.boost.org> wrote:

...

I would (like to ... ) have a template-SizeType-parameter and leave the signed- or unsigned-ness in the middle (while making sure things work correctly for both types (not hard)).

I don't like this at all. It is needless complexity for no benefit. std::size_t is unsigned, get over it and move on with your life, to focus on important things. Thanks

1951

Age (days ago)

1958

Last active (days ago)

List overview

Download

16 comments

7 participants

participants (7)

Alexander Grund
degski
Gavin Lambert
Glen Fernandes
Krystian Stasiowski
Mateusz Loskot
Vinnie Falco