Bjørn Roald wrote:
I think encoding is going to be a challenge.
On Posix I think you are right that one can assume the character encoding is defined by the system and that may be a multi or a single byte character strings, whatever is defined in the locale.
On POSIX, the system doesn't care about encodings. You get from getenv exactly the byte string you passed to setenv.
File paths in Windows are stored in double byte character strings encoded as UCS-2 which is fixed width 2 byte predecessor of UTF-16.
No, file paths on Windows are UTF-16. I'm not quite sure how SetEnvironmentVariableA and SetEnvironmentVariableW interact though, I don't see it documented. The typical behavior for an A/W pair is for the A function to be implemented in terms of the W one, using the current system code page for converting the strings. The C runtime getenv/_putenv functions actually maintain two separate copies of the environment, one narrow, one wide. https://msdn.microsoft.com/en-us/library/tehxacec.aspx The problem therefore is that it's not quite possible to provide a portable interface. On POSIX, programs have to use the char* functions, because they don't encode/decode and therefore guarantee a perfect round-trip. Using wchar_t* may fail if the contents of the environment do not correspond to the encoding that the library uses. On Windows, programs have to use the wchar_t* versions, for the same reason. Using char* may give you a mangled result in the case the environment contains a file name that cannot be represented in the current encoding. (If the library uses the C runtime getenv/_putenv functions, those will likely guarantee a perfect round-trip, but this will not solve the problem with a preexisting wide environment that is not representable.) Many people - me included - have adopted a programming model in which char[] strings are assumed to be UTF-8 on Windows, and the char[] API calls the wide Windows API internally, then converts between UTF-16 and UTF-8 as appropriate. Since the OS X POSIX API is UTF-8 based and most Linux systems are transitioning or have already transitioned to UTF-8 as default, using UTF-8 and char[] results in reasonably portable programs. This however doesn't appeal to people who prefer to use another encoding, and makes the char[] API not correspond to the Windows char[] API (the A functions) as those use the "ANSI code page" which can't be UTF-8. Boost.Filesystem sidesteps the problem by letting you choose whatever encoding you wish. I don't particularly like this approach.