On 16/08/2022 19:44, Peter Dimov wrote:
Gavin Lambert wrote:
Using wchar_t on Windows is actually the least painful option. (And you don't have to worry about locales and imbuements etc if you never try to convert to not-wchar_t.)
That's only if your program never runs on anything else. For portable code, using char and UTF-8 is the least painful option. We have an entire library in Boost for this purpose, whose documentation does a reasonable job explaining that.
Currently, yes. In theory, though, you could adopt a TCHAR-like approach where you use wchar_t on Windows and char/char8_t on not-Windows, selected at compile time. This would avoid all conversions and just use the native character type of the OS, which would be better. (Windows has its own version of the invalid characters problem -- it's legal to have mismatched surrogates in filenames, which work fine as long as you keep everything in wchar_t UCS-2 and never convert it, but break if you convert to UTF-8 and back. It's probably less common than not-UTF-8 non-Windows filenames, though.) The downside is that you need every single bit of code to either use this TCHAR type (which in turn means that you need to be able to recompile everything), or (better) to provide overloads for all possible underlying types (with the same name, so that the actual code is spelled the same either way), and some usages may need macros or char_traits etc. (But then that tends to lead to either code duplication or over-templating, neither of which is good.) Ideally, the standard library would have defined such a platform-specific type alias (notably, not actually a distinct type, so that existing overloads work), which would have made it easier to build up libraries around it, or at least encourage writing both overloads. Or the language would define some kind of compile-time-variant that permits separate-translation-unit implementation of overloaded types that have the "same" implementation without header-only templates. Sadly that hasn't happened yet.