John Meinel wrote:
David Abrahams wrote:
Sure; by the same token we could also use utf-8 and encode your Unicode in narrow strings.
Actually, I was wondering why this isn't used? The "big" advantage for UTF-16 was that it followed the one char->one code point. But then that was broken with the new UNICODE spec. So why not stick with utf-8. I know that most Linux file systems will support utf-8 (if your terminal supports it, then you see the nice characters, otherwise you see really bad "ASCII" ones.)
I know there is a gnome library with a Glib::ustring that I believe internally uses a utf-8 string.
However, isn't utf-8 fully compatible with std::string? Provided that you understand some "characters" take more than one char? But that only matters when you are trying to interpret what the string means, which is done by the OS, or by something that is rendering it on the screen.
(I am not an expert.) Unfortunately, utf8 and similar do not work correctly in C++ for many common cases. For example, the thousands separator in a C++ is mandated by the standard to only be a single character, but in some locales, the utf8 sequence to represent the preferred character is more than one character. utf8 is great for simply storing and copying strings, but it will fail quickly if you try to do any character-level direct manipulation on it without outside help.
I suppose you still have to convert whenever you call one of the OpenFileW commands. And probably that is what all this is about. Someone feels that everything should be handled in the "native" format (which on Win32 is some sort of wchar_t, and on other platforms is char (though a UTF-8 char)).
My personal vote is to have the library convert to whatever internal representation is considered "preferred", and then have the convenience functions for converting to whatever the user wants. (native_file_wstring).
I agree. I think the interface should have both narrow and wide versions, provided was normal functions without templates or other character polymorphism. On operating systems that only use char, we can do the same conversion that std::wcout presently does on these systems. On operating systems such as Win32 that have the unique ability to take both narrow and wide operands natively, no conversion will be necessary. I don't think this will do the wrong thing in any reasonable case. Aaron W. LaFramboise