On 23. mai 2015 02:18, Michael Ainsworth wrote:
On 22 May 2015, at 8:21 pm, Klaim - Joël Lamotte
wrote: By the way, what would be the encoding of the strings returned by or passed to the Environment library?
Given that std::getenv returns a char*, I think the library should work with std::string, although we did discuss supporting std::wstring using templates. Whether std::string is encoded in ASCII or UTF8 would be an OS specific thing I imagine.
Someone with more experience with character encodings might want to weigh in here.
[Michael, I took the liberty of rearranging you response a bit as you are top posting, see http://www.boost.org/community/policy.html] Disclaimer: I am no character encoding expert, so take care to verify claims by me here. I think encoding is going to be a challenge. On Posix I think you are right that one can assume the character encoding is defined by the system and that may be a multi or a single byte character strings, whatever is defined in the locale. As the Posix getenv, setenv functions are simply char* based with no statements on encoding, it is possible to let the system determine the encoding. UTF-8 will likely be used for UNICODE support, as other options make little sense. On Windows however there are variants of the windows API for environment variables: BOOL WINAPI SetEnvironmentVariable( _In_ LPCTSTR lpName, _In_opt_ LPCTSTR lpValue ); Unicode and ANSI names SetEnvironmentVariableW (Unicode) and SetEnvironmentVariableA (ANSI) The regular SetEnvironmentVariable use LPCTSTR, and according to https://msdn.microsoft.com/en-us/library/windows/desktop/aa383751%28v=vs.85%... LPCTSTR is an LPCWSTR if UNICODE is defined, an LPCSTR otherwise. #ifdef UNICODE typedef LPCWSTR LPCTSTR; #else typedef LPCSTR LPCTSTR; #endif File paths in Windows are stored in double byte character strings encoded as UCS-2 which is fixed width 2 byte predecessor of UTF-16. Other string data may not be double byte character strings, and ASCII and ANSI strings will certainly exist in C++ code. Nevertheless it seems the conversions should happen when the API is setting or getting the variables. I am not sure how these Unicode and ANSI name variants of the API functions interact with the actual storage of the variables in the environment block, but it make sense that code need to use them to convert when needed from program code when a conversion is needed. A standard C++ library need to facilitate for these conversions as well. I am not sure how that is best done, but I can imagine the Boost.Filesystem library have considered options for a very similar problem. As the UNICODE macro determine if your Windows program have single or double byte characters in its environment block with ANSI or UNICODE UCS-2 value encoding respectively, a conversion may be needed when creating child processes. The CreateProcess function seems to support that, see the section on the lpEnvironment argument here https://msdn.microsoft.com/en-us/library/windows/desktop/ms682425%28v=vs.85%... It is annoying that Microsoft ended up using UCS-2. Other operating systems waited a bit longer to decide how to support UNICODE I think and thus had a better option available with UTF-8. But the situation is what it is and we have to deal with it. -- Bjørn