On Fri, Jun 16, 2017 at 8:00 AM, Frédéric Bron
Please note: Under POSIX platforms no conversions are performed and no UTF-8 validation is done as this is incorrect:
I do not quite understand the rationale behind not converting to UTF-8 on Posix platforms. I naively though I got UTF-8 in argv because my system is convigured in UTF-8 but I discover that this is not necessary always the case. In the example you highlight, I do not see the difference from the Windows case. You could convert to UTF-8 in argv and back to the local encoding in nowide::remove. I understand it is not efficient if you do not really use the content of the filename but if you have to write, say an xml report in UTF-8, you would have to convert anyway.
Today, what is the portable way to convert argv to UTF-8? i.e. without having to #ifdef _WIN32...?
Frédéric
Hello Frederic, There are several reasons for this. One of is actually original purpose of the library: use same type of strings internally without creating broken software on Windows and since only Windows use native Wide instead of narrow API which is native for C++ only Windows case requires encoding conversion. However there is another deeper issue. Unlike on Windows where native wide API has well defined UTF-16 encoding it isn't the case for Unix like OSes. The encoding is defined by current Locale that can be defined globally, per user, per process and even change trivially in the same process during the runtime. There are also several sources of the Locale/Encoding information: Environment variables: - LANG/LC_CTYPE - which is UTF-8 on vast majority of modern Unix like platforms but frequently can be undefined or defined as "C" locale without encoding information. This one is what OS defines for the process. - C locale: setlocale API - which is by default "C" locale by standard unless explicitly defined otherwise - C++ locale: std::locale::global() API - which is by default "C" locale by standard unless explicitly defined otherwise They are all can be changed at runtime, they aren't synchronized and they be modified to whatever encoding user wants. Additionally using std::locale::global as not "C" locale can lead to some really nasty things like failing to create CSV files due to adding "," to numbers. So the safest and the most correct way to handle it is to pass narrow strings as is without any conversion. Regards, Artyom Beilis