Re: [boost] Environment Variables Library?

23 May 2015

      Bjørn Roald wrote:
...
I think encoding is going to be a challenge.
On Posix I think you are right that one can assume the character encoding 
is defined by the system and that may be a multi or a single byte 
character strings, whatever is defined in the locale.
On POSIX, the system doesn't care about encodings. You get from getenv 
exactly the byte string you passed to setenv.
...
File paths in Windows are stored in double byte character strings encoded 
as UCS-2 which is fixed width 2 byte predecessor of UTF-16.
No, file paths on Windows are UTF-16.

I'm not quite sure how SetEnvironmentVariableA and SetEnvironmentVariableW 
interact though, I don't see it documented. The typical behavior for an A/W 
pair is for the A function to be implemented in terms of the W one, using 
the current system code page for converting the strings.

The C runtime getenv/_putenv functions actually maintain two separate copies 
of the environment, one narrow, one wide.

https://msdn.microsoft.com/en-us/library/tehxacec.aspx

The problem therefore is that it's not quite possible to provide a portable 
interface.

On POSIX, programs have to use the char* functions, because they don't 
encode/decode and therefore guarantee a perfect round-trip. Using wchar_t* 
may fail if the contents of the environment do not correspond to the 
encoding that the library uses.

On Windows, programs have to use the wchar_t* versions, for the same reason. 
Using char* may give you a mangled result in the case the environment 
contains a file name that cannot be represented in the current encoding.

(If the library uses the C runtime getenv/_putenv functions, those will 
likely guarantee a perfect round-trip, but this will not solve the problem 
with a preexisting wide environment that is not representable.)

Many people - me included - have adopted a programming model in which char[] 
strings are assumed to be UTF-8 on Windows, and the char[] API calls the 
wide Windows API internally, then converts between UTF-16 and UTF-8 as 
appropriate. Since the OS X POSIX API is UTF-8 based and most Linux systems 
are transitioning or have already transitioned to UTF-8 as default, using 
UTF-8 and char[] results in reasonably portable programs.

This however doesn't appeal to people who prefer to use another encoding, 
and makes the char[] API not correspond to the Windows char[] API (the A 
functions) as those use the "ANSI code page" which can't be UTF-8.

Boost.Filesystem sidesteps the problem by letting you choose whatever 
encoding you wish. I don't particularly like this approach.

Re: [boost] Environment Variables Library?

Peter Dimov