[nowide] request for clarification
Hi Everyone, I admit I am not quite familiar with the problem, but I understand that as one of the features, nowide offers a replacement for std::fstream that can be constructed with its string types. At the same time we have boost::filesystem that offers its own replacement for std::fstream that can be constructed with filesystem::path. Now, if I want to use `filesystem::path`s in my program (to be able to tell just any string from a filesystem path), can I still use the benefits of `nowide` library? Also, in the docs for nowide::ifstream, we read, "Same as std::basic_ifstream<char> but accepts UTF-8 strings under Windows." What about other systems? What does it accept on Linux? ascii? In documentation for `nowide::args`, we read, "args is a class that fixes standard main() function arguments and changes them to UTF-8 under Microsoft Windows." Does it write to the input strings in-place? is it even legal in C++? It "fixes", which implies that otherwise the args are "broken". How are args in function main() broken? (other than not being UTF-8)? Regards, &rzej;
On Fri, Jun 16, 2017 at 6:12 PM, Andrzej Krzemienski via Boost
Hi Everyone, I admit I am not quite familiar with the problem, but I understand that as one of the features, nowide offers a replacement for std::fstream that can be constructed with its string types. At the same time we have boost::filesystem that offers its own replacement for std::fstream that can be constructed with filesystem::path. Now, if I want to use `filesystem::path`s in my program (to be able to tell just any string from a filesystem path), can I still use the benefits of `nowide` library?
Yes of course. There is an integration between nowide and filesystem to make sure it considers narrow API to be UTF-8. Also note the nowide::fstream works on MinGW as well as filesystem.fstream calls std::fstream and only MSVC version has open(wchar_t const *).
Also, in the docs for nowide::ifstream, we read, "Same as std::basic_ifstream<char> but accepts UTF-8 strings under Windows." What about other systems? What does it accept on Linux? ascii?
Under Linux it accepts "char *" in whatever encoding it is considered. See: http://cppcms.com/files/nowide/html/index.html#qna
In documentation for `nowide::args`, we read, "args is a class that fixes standard main() function arguments and changes them to UTF-8 under Microsoft Windows." Does it write to the input strings in-place? is it even legal in C++?
It replaces values of argc and argv and points them to other location not modifying the original values.
It "fixes", which implies that otherwise the args are "broken". How are args in function main() broken? (other than not being UTF-8)?
That main(argc,argv) receives parameters converted from native UTF-8 internal API to current locale's codepage - generally not being able to represent the all the required charset (since Windows does not support UTF-8 as native locale)
Regards, &rzej;
Best, Artyom
2017-06-16 22:10 GMT+02:00 Artyom Beilis via Boost
On Fri, Jun 16, 2017 at 6:12 PM, Andrzej Krzemienski via Boost
wrote: Hi Everyone, I admit I am not quite familiar with the problem, but I understand that as one of the features, nowide offers a replacement for std::fstream that can be constructed with its string types. At the same time we have boost::filesystem that offers its own replacement for std::fstream that can be constructed with filesystem::path. Now, if I want to use `filesystem::path`s in my program (to be able to tell just any string from a filesystem path), can I still use the benefits of `nowide` library?
Yes of course. There is an integration between nowide and filesystem to make sure it considers narrow API to be UTF-8.
Also note the nowide::fstream works on MinGW as well as filesystem.fstream calls std::fstream and only MSVC version has open(wchar_t const *).
Also, in the docs for nowide::ifstream, we read, "Same as std::basic_ifstream<char> but accepts UTF-8 strings under Windows." What about other systems? What does it accept on Linux? ascii?
Under Linux it accepts "char *" in whatever encoding it is considered. See: http://cppcms.com/files/nowide/html/index.html#qna
In documentation for `nowide::args`, we read, "args is a class that fixes standard main() function arguments and changes them to UTF-8 under Microsoft Windows." Does it write to the input strings in-place? is it even legal in C++?
It replaces values of argc and argv and points them to other location not modifying the original values.
Ok. It makes sense :)
It "fixes", which implies that otherwise the args are "broken". How are args in function main() broken? (other than not being UTF-8)?
That main(argc,argv) receives parameters converted from native UTF-8 internal API to current locale's codepage - generally not being able to represent the all the required charset (since Windows does not support UTF-8 as native locale)
But given that what main() receives is already broken (Windos already could not handle a name containing letters from two code pages), how can you recover from this loss of information? Regards, &rzej;
But given that what main() receives is already broken (Windos already could not handle a name containing letters from two code pages), how can you recover from this loss of information? Regards, &rzej; Take a look to the code :-) I use WinAPI to retrieve the original Utf-16 args. I don't relay on the original strings. Artyom
בתאריך 17 ביוני 2017 12:17 אחה״צ, "Frédéric Bron"
Take a look to the code :-) I use WinAPI to retrieve the original Utf-16 args. I don't relay on the original strings.
This is interesting. You can retreive more than what was given! I think you should document this, not only in the code it-self. Frédéric Actually it is cppcms.com/files/nowide/html/classboost_1_1nowide_1_1args.html Artyom
2017-06-17 10:37 GMT+02:00 Artyom Beilis via Boost
But given that what main() receives is already broken (Windos already could not handle a name containing letters from two code pages), how can you recover from this loss of information?
Regards, &rzej;
Take a look to the code :-)
I use WinAPI to retrieve the original Utf-16 args. I don't relay on the original strings.
This is impressive, and simple. Thanks, &rzej;
It "fixes", which implies that otherwise the args are "broken". How are args in function main() broken? (other than not being UTF-8)?
That main(argc,argv) receives parameters converted from native UTF-8 internal API to current locale's codepage - generally not being able to represent the all the required charset (since Windows does not support UTF-8 as native locale)
This is incorrect. You have been able to set the Windows console to UTF-8 for many years. Just issue `chcp 65001`, your console is now in UTF-8 and UTF-8 strings will present to argv. Indeed it is possible to set UTF-8 consoles globally as the default, but lots of stuff hard assumes Latin1 input and gets very upset if it sees UTF-8. In particular, MSVCRT, though maybe the VS2015 rewrite of MSVCRT has fixed that. I do remember .NET programs ran great with UTF-8 input though, as do NT kernel programs. The blocker is MSVCRT. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/
Niall Douglas wrote:
That main(argc,argv) receives parameters converted from native UTF-8 internal API to current locale's codepage - generally not being able to represent the all the required charset (since Windows does not support UTF-8 as native locale)
This is incorrect. You have been able to set the Windows console to UTF-8 for many years. Just issue `chcp 65001`, your console is now in UTF-8 and UTF-8 strings will present to argv.
You can set the console to UTF-8 and it will display UTF-8 correctly, but
will UTF-8 strings come in (the narrow) argv? I think not.
#include <iostream>
int main( int argc, char const* argv[] )
{
std::cout << argv[1] << std::endl;
}
C:\Users\Peter Dimov>chcp 65001
Active code page: 65001
C:\Projects\testbed2017>debug\testbed2017.exe проба
?????
Whereas:
#include
participants (5)
-
Andrzej Krzemienski
-
Artyom Beilis
-
Frédéric Bron
-
Niall Douglas
-
Peter Dimov