Two issues with lexical_cast
Hi. I'm probably just another one out of a million who make these remarks, but I found nothing in the documentation about them, so I'll mention them anyway. 1. lexical_cast(Source arg) takes the source argument by value. Why not by const& ? Calling lexical_cast(some_basic_string) now makes a copy of the source argument for no reason. 2. lexical_cast ignores trailing whitespaces, but not leading whitespaces. Meaning that int a = lexical_cast<int>("3 "); // three-space will work and return 3, but int a = lexical_cast<int>(" 3"); // space-three will throw. Why? There's no rationale here. Both cases should be treated in the same manner, and IMHO, that manner should be throwing. After a look at the code, it seems that the trailing-whitespace-ignore code ('stream >> std::ws' in line 151) was specifically added, so there was supposed to be a reason for this. But as I said, I believe this is wrong, at least as the default behaviour when converting a string to something else. Maybe some policies (IgnoreLeadingWhitespace, IgnoreTrailingWhitespace) can be added to control this? And again, I suppose I'm not the first to think of it... Thanks, Yuval
Any chance of hearing from the official booster on these matters? Does he agree with me or maybe I'm talking rubbish? Should I be looking forward to the next boost version where it will be fixed, or not to bother and continue to use my implementation of lexical_cast? Any info would be greatly appreciated. Thanks, Yuval
Any chance of hearing from the official booster on these matters?
If you mean original author of lexical_cast, I wouln't count on it. Kevlin is rare guest here. As for "official" part: you are as official as anybody else.
Does that mean that I can obtain access to the boost CVS with write permissions and start making changes? I hope that the boost code is a bit better protected. If any hooligan like myself can come and make changes to fit his desires, then it sounds weird... Or in a more humoristic note: "I don't care to belong to any club that will have me as a member" (Groucho Marx of course, inaccurate quote) :-)
From: "Yuval Ronen"
Any chance of hearing from the official booster on these matters?
Gennadiy Rozental:
If you mean original author of lexical_cast, I wouln't count on it. Kevlin is rare guest here. As for "official" part: you are as official as anybody else.
Does that mean that I can obtain access to the boost CVS with write permissions and start making changes?
Yes, if there's a good reason for it, such as if you have a library that has been accepted into Boost, or is helping maintaining one of them.
I hope that the boost code is a bit better protected. If any hooligan like myself can come and make changes to fit his desires, then it sounds weird...
You first need to get permission. :) That typically also means that at least someone among admin, or other boosters, knows you well enough to think you're not a "hooligan". Also, I think anyone starting to screw up the CVS will see their write permission revoked faster than you can say "commit". :) Any files "deleted" may always be brought back, for example. I think you'll find that the Boost library is in good hands.
Or in a more humoristic note: "I don't care to belong to any club that will have me as a member" (Groucho Marx of course, inaccurate quote)
:) Regards, Terje
Yuval Ronen wrote:
Hi. I'm probably just another one out of a million who make these remarks, but I found nothing in the documentation about them, so I'll mention them anyway.
1. lexical_cast(Source arg) takes the source argument by value. Why not by const& ? Calling lexical_cast(some_basic_string) now makes a copy of the source argument for no reason.
There actually was a reason. Something about certain functions not working properly with a const&. Like the std::limits or something like that. (I ran into this problem because lexical_cast(Parent p) when given a child will actually create a parent object, so no amount of overriding virtuals changes things.)
2. lexical_cast ignores trailing whitespaces, but not leading whitespaces. Meaning that
int a = lexical_cast<int>("3 "); // three-space
will work and return 3, but
int a = lexical_cast<int>(" 3"); // space-three
Here, I agree with you, but I don't know the rationale at all.
will throw. Why? There's no rationale here. Both cases should be treated in the same manner, and IMHO, that manner should be throwing. After a look at the code, it seems that the trailing-whitespace-ignore code ('stream >> std::ws' in line 151) was specifically added, so there was supposed to be a reason for this. But as I said, I believe this is wrong, at least as the default behaviour when converting a string to something else. Maybe some policies (IgnoreLeadingWhitespace, IgnoreTrailingWhitespace) can be added to control this? And again, I suppose I'm not the first to think of it...
Thanks, Yuval
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
John =:->
1. lexical_cast(Source arg) takes the source argument by value. Why not by const& ? Calling lexical_cast(some_basic_string) now makes a copy of the source argument for no reason.
There actually was a reason. Something about certain functions not working properly with a const&. Like the std::limits or something like that. (I ran into this problem because lexical_cast(Parent p) when given a child will actually create a parent object, so no amount of overriding virtuals changes things.)
The thing about the parent being created is serious. So the problem is even more severe than I thought...
1. lexical_cast(Source arg) takes the source argument by value. Why not by const& ? Calling lexical_cast(some_basic_string) now makes a copy of the source argument for no reason.
There numerous optimization/enhancements that could be done to lexical_cast. In my version I use different interpretators, type wrappers, specializations e.t.c. And still in my recent study I found that lexical_cast ~ 20 times slower then sprintf and ~200 times slower then my own handcrafted version for decimal integer parcing. Conclusion: do NOT use lexical_cast in performance critical (even performance aware) parts of your code. The only place where I use it is during initialization. Accordingly no need to bother about reverence vs. value semantic.
2. lexical_cast ignores trailing whitespaces, but not leading whitespaces. Meaning that
My recomendation is: trim the value on both sodes before cast. And do not rely on lexical cast to detect your whitespaces.
Thanks, Yuval
HTH, Gennadiy
1. lexical_cast(Source arg) takes the source argument by value. Why not by const& ? Calling lexical_cast(some_basic_string) now makes a copy of the source argument for no reason.
There numerous optimization/enhancements that could be done to lexical_cast. In my version I use different interpretators, type wrappers, specializations e.t.c. And still in my recent study I found that lexical_cast ~ 20 times slower then sprintf and ~200 times slower then my own handcrafted version for decimal integer parcing. Conclusion: do NOT use lexical_cast in performance critical (even performance aware) parts of your code. The only place where I use it is during initialization. Accordingly no need to bother about reverence vs. value semantic.
2. lexical_cast ignores trailing whitespaces, but not leading whitespaces. Meaning that
My recomendation is: trim the value on both sodes before cast. And do not rely on lexical cast to detect your whitespaces.
Yep, that's a possibility that will work (and will add another string copy to the mix)...
For 1, I'm not sure.
For 2, It seems more like a problem with stringstream - I think
lexical_cast does:
stringstream ss;
ss << input;
ss >> output;
return output;
On Sat, 15 Jan 2005 18:47:57 +0200, Yuval Ronen
Hi. I'm probably just another one out of a million who make these remarks, but I found nothing in the documentation about them, so I'll mention them anyway.
1. lexical_cast(Source arg) takes the source argument by value. Why not by const& ? Calling lexical_cast(some_basic_string) now makes a copy of the source argument for no reason.
2. lexical_cast ignores trailing whitespaces, but not leading whitespaces. Meaning that
int a = lexical_cast<int>("3 "); // three-space
will work and return 3, but
int a = lexical_cast<int>(" 3"); // space-three
will throw. Why? There's no rationale here. Both cases should be treated in the same manner, and IMHO, that manner should be throwing. After a look at the code, it seems that the trailing-whitespace-ignore code ('stream >> std::ws' in line 151) was specifically added, so there was supposed to be a reason for this. But as I said, I believe this is wrong, at least as the default behaviour when converting a string to something else. Maybe some policies (IgnoreLeadingWhitespace, IgnoreTrailingWhitespace) can be added to control this? And again, I suppose I'm not the first to think of it...
Thanks, Yuval
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
-- Cory Nelson http://www.int64.org
For 2, It seems more like a problem with stringstream - I think lexical_cast does:
stringstream ss; ss << input; ss >> output; return output;
Not quite. In lexical_cast there's a code to specifically *not* ignore
leading whitespace - look for for the line
'stream.unsetf(std::ios::skipws)'. It also contains a code to
specifically *ignore* trailing whitespace (but not other characters) -
look for 'stream >> std::ws'. So this was done intentionally, even
though mistakenly in my opinion.
My version of 'lexical_cast
From: "Yuval Ronen"
Hi.
I'm probably just another one out of a million who make these remarks, but I found nothing in the documentation about them, so I'll mention them anyway.
Very good. :) Actually, the second issue hasn't to my knowledge been brought up, before, and while the first one has, it's a good question that deserves reconsideration. I've brought this thread to Kevlin Henney's attention, as I've tended to take care of lexical_cast related questions on the list (including forwarding things needing his consideration, such as any changes to lexical_cast, to him). I'm replying here to let you know that your report is being taken care of.
1. lexical_cast(Source arg) takes the source argument by value. Why not by const& ? Calling lexical_cast(some_basic_string) now makes a copy of the source argument for no reason.
I heard from Kevlin that the reason was that there originally were some issues with binding to string literals. Changing it to use const reference in my local copy still gives a lot of error messages, having to do with std::numeric_limits being used in the implementation. If we find a way to fix this (yet letting it still work on compilers with no class template partial specialisation, like MSVC 6), without breaking existing programs, then apparently, this change should be ok. If anyone makes a patch for this, it would be welcome.
2. lexical_cast ignores trailing whitespaces, but not leading whitespaces. Meaning that
int a = lexical_cast<int>("3 "); // three-space
will work and return 3, but
int a = lexical_cast<int>(" 3"); // space-three
will throw. Why? There's no rationale here. Both cases should be treated in the same manner, and IMHO, that manner should be throwing.
In Kevlin's reply to me, he agreed with this. This has been fixed, and will likely be committed soon, and a note posted about the update. Thanks for your feedback, and sorry for the late response. Regards, Terje
Terje Slettebø wrote:
I've brought this thread to Kevlin Henney's attention, as I've tended to take care of lexical_cast related questions on the list (including forwarding things needing his consideration, such as any changes to lexical_cast, to him). I'm replying here to let you know that your report is being taken care of.
Many, many thanks!
1. lexical_cast(Source arg) takes the source argument by value. Why not by const& ? Calling lexical_cast(some_basic_string) now makes a copy of the source argument for no reason.
I heard from Kevlin that the reason was that there originally were some issues with binding to string literals. Changing it to use const reference in my local copy still gives a lot of error messages, having to do with std::numeric_limits being used in the implementation. If we find a way to fix this (yet letting it still work on compilers with no class template partial specialisation, like MSVC 6), without breaking existing programs, then apparently, this change should be ok. If anyone makes a patch for this, it would be welcome.
Hmmm, interesting. I'll give it a look. Maybe I'll be lucky enough to think of something...
2. lexical_cast ignores trailing whitespaces, but not leading whitespaces. Meaning that
int a = lexical_cast<int>("3 "); // three-space
will work and return 3, but
int a = lexical_cast<int>(" 3"); // space-three
will throw. Why? There's no rationale here. Both cases should be treated in the same manner, and IMHO, that manner should be throwing.
In Kevlin's reply to me, he agreed with this. This has been fixed, and will likely be committed soon, and a note posted about the update.
Great! Thanks again.
From: "Yuval Ronen"
Terje Slettebø wrote:
I've brought this thread to Kevlin Henney's attention, as I've tended to take care of lexical_cast related questions on the list (including forwarding things needing his consideration, such as any changes to lexical_cast, to him). I'm replying here to let you know that your report is being taken care of.
Many, many thanks!
1. lexical_cast(Source arg) takes the source argument by value. Why not by const& ? Calling lexical_cast(some_basic_string) now makes a copy of the source argument for no reason.
I heard from Kevlin that the reason was that there originally were some issues with binding to string literals. Changing it to use const reference in my local copy still gives a lot of error messages, having to do with std::numeric_limits being used in the implementation. If we find a way to fix this (yet letting it still work on compilers with no class template partial specialisation, like MSVC 6), without breaking existing
then apparently, this change should be ok. If anyone makes a patch for
Thanks, yourself. :) Feedback is always welcome (regardless who's library it is); it's an opportunity to improve it. programs, this,
it would be welcome.
Hmmm, interesting. I'll give it a look. Maybe I'll be lucky enough to think of something...
I've been looking at this, as well, and after trying one approach (doing a
compile-time switch to select between the ordinary, and a "dummy"
std::numeric_limits (if the source type is an array)), I bumped into another
problem: the specialisations for wchar_t, which assumes "const wchar_t *",
not "const wchar_t[...]". This might again be fixed, using partial
specialisation, to work with array, but another approach could be to "go
back to square one" and do the array-to-pointer decay "manually":
template
From: "Terje Slettebø"
From: "Yuval Ronen"
1. lexical_cast(Source arg) takes the source argument by value. Why not by const& ? Calling lexical_cast(some_basic_string) now makes a copy of the source argument for no reason.
template
Target lexical_cast(const Source &arg) { // Array-to-pointer decay typedef typename mpl::if_< is_array<Source>, add_pointer
::type>::type, Source >::type NewSource; detail::lexical_stream
interpreter; Target result; if(!(interpreter << arg && interpreter >> result)) throw_exception(bad_lexical_cast(typeid(NewSource), typeid(Target))); return result; }
The additional includes are:
#include
#include #include #include #include
Come to think of it, it can be done simpler and clearer, and avoiding any
extra includes, like this:
namespace detail
{
template<class T>
struct array_to_pointer_decay
{
typedef T type;
};
template
Come to think of it, it can be done simpler and clearer, and avoiding any extra includes, like this:
namespace detail { template<class T> struct array_to_pointer_decay { typedef T type; };
template
struct array_to_pointer_decay { typedef const T * type; }; } template
Target lexical_cast(const Source &arg) { typedef typename detail::array_to_pointer_decay<Source>::type NewSource; detail::lexical_stream
interpreter; Target result; if(!(interpreter << arg && interpreter >> result)) throw_exception(bad_lexical_cast(typeid(NewSource), typeid(Target))); return result; }
I haven't tried it, but I have a feeling that it will work for string literals, but not for c-string variables, so another specialization might be needed. But it shouldn't be too hard to add, so no real problem here. Or maybe c-string variables aren't supposed to be supported anyway?
This requires support for partial specialisation, but so does remove_bounds used in the first suggestion, so it doesn't change the compiler requirements. Besides, the current version of lexical_cast in CVS doesn't work with MSVC 6, anyway. Of course, if this was very important, I guess it would be possible to fall back to pass by value for MSVC 6, and possibly fix other problems with it. I also see that MSVC 6 is no longer part of the Boost regression test.
Do you mean that boost has finally dropped support for MSVC 6? Congratulations, it's about time... :-) Anyway, the above solution looks excellent to me.
From: "Yuval Ronen"
Come to think of it, it can be done simpler and clearer, and avoiding any extra includes, like this:
namespace detail { template<class T> struct array_to_pointer_decay { typedef T type; };
template
struct array_to_pointer_decay { typedef const T * type; }; } template
Target lexical_cast(const Source &arg) { typedef typename detail::array_to_pointer_decay<Source>::type NewSource; detail::lexical_stream
interpreter; Target result; if(!(interpreter << arg && interpreter >> result)) throw_exception(bad_lexical_cast(typeid(NewSource), typeid(Target))); return result; }
I haven't tried it, but I have a feeling that it will work for string literals, but not for c-string variables
With "c-string variables", do you mean pointer to char? Or something like MFC's CString? Anyway, the first is supported as before. The above just ensures that the type passed to the rest of lexical_cast is the same type it used to be, when it used pass by value. So arrays will decay to pointers, and everything else will pass through unaltered. I ran the above through the unit test, and it passes all, on the three compilers I tested it with. I've since talked with Kevlin, and this was also what he had in mind, to make pass by const reference work.
This requires support for partial specialisation, but so does remove_bounds used in the first suggestion, so it doesn't change the compiler requirements. Besides, the current version of lexical_cast in CVS doesn't work with MSVC 6, anyway. Of course, if this was very important, I guess it would be possible to fall back to pass by value for MSVC 6, and possibly fix other problems with it. I also see that MSVC 6 is no longer part of the Boost regression test.
Do you mean that boost has finally dropped support for MSVC 6?
Well, eh, not really. :) I asked about this on the developer list, and got
this reply:
--- Start quote ---
From: "Jeff Garland"
Because of this, Kevlin Henney and I was wondering whether or not MSVC 6 support for this component (and components in Boost in general) is no longer demanded, or in general, what the required conformance level is?
We've discussed this a number of times, but there is no 'boost-wide' policy. Some libraries have slowly started dropping VC6 support -- spirit comes to mind. Some authors (as I recall the Boost.Python folks) have a strong user base still using VC6 and don't want to cut it off. With date-time I've been taking the approach of attempting to maintain existing functionality, but new functionality that breaks VC6 is essentially simply marked. Even maintaining existing functions takes significant effort because of the VC6 fragilities. For example, someone recently reported that VC6 with STLPort and date-time creates an ICE. Personally, I think the time has come to cut the cord on VC6 testing and compatibility. In my mind Boost as a whole is being held back by the time Boost developers waste hacking apart their implementations and answering questions for this 7 year old, seriously non-compliant compiler. Perhaps the fact that we move on will spur some people to upgrade compilers, improving the life of a few c++ developers along the way (I suppose this also might increase sales in Redmond -- of course they already mention Boost on the VC7 packaging...). And the folks stuck with VC6 can always use older versions of Boost or test out new versions against VC6 for themselves, submit patches, or pay someone to do this work. BTW, I have an interest in your decision b/c date-time depends on lexical_cast. So far whatever is breaking the regression tests doesn't seem to be affecting date-time... Jeff --- End quote --- However, since lexical_cast is such a simple component, and that - like it or not - it seems that quite a few people are still using MSVC 6 at work (possibly due to the codebase not handling a newer compiler...), it would be a shame not to make it available for them, as well. Therefore, we'll use the code above (which requires partial specialisation), and supply preprocessor-switched workaround-code for compilers lacking PTS, using full specialisation.
Congratulations, it's about time... :-)
I wish... :) One problem is that it's only a couple of years since MSVC 7.1 came, and even though MSVC 6 is old, 7.1 was the first version with PTS. So we might have to live with 6.0 a little longer. At least for existing libraries.
Anyway, the above solution looks excellent to me.
Thanks. :) Regards, Terje
I haven't tried it, but I have a feeling that it will work for string literals, but not for c-string variables
With "c-string variables", do you mean pointer to char? Or something like MFC's CString?
pointer to char
Anyway, the first is supported as before. The above just ensures that the type passed to the rest of lexical_cast is the same type it used to be, when it used pass by value. So arrays will decay to pointers, and everything else will pass through unaltered. I ran the above through the unit test, and it passes all, on the three compilers I tested it with. I've since talked with Kevlin, and this was also what he had in mind, to make pass by const reference work.
Ok. As I said, this comment was made without trying it, so it's quite possible (and even probable) that I was mistaking. I'll be more careful next time (spank on the tush for me...)
participants (5)
-
Cory Nelson
-
Gennadiy Rozental
-
John A Meinel
-
Terje Slettebø
-
Yuval Ronen