Re: [Boost-users] Re: Feature request for boost::filesystem

2 Jul 2004

      John Meinel wrote:
...
David Abrahams wrote:
...
Sure; by the same token we could also use utf-8 and encode your
Unicode in narrow strings.
Actually, I was wondering why this isn't used? The "big" advantage for 
UTF-16 was that it followed the one char->one code point. But then that 
was broken with the new UNICODE spec. So why not stick with utf-8. I 
know that most Linux file systems will support utf-8 (if your terminal 
supports it, then you see the nice characters, otherwise you see really 
bad "ASCII" ones.)
I know there is a gnome library with a Glib::ustring that I believe 
internally uses a utf-8 string.
However, isn't utf-8 fully compatible with std::string? Provided that 
you understand some "characters" take more than one char? But that only 
matters when you are trying to interpret what the string means, which is 
done by the OS, or by something that is rendering it on the screen.
(I am not an expert.)

Unfortunately, utf8 and similar do not work correctly in C++ for many
common cases.  For example, the thousands separator in a C++ is mandated
by the standard to only be a single character, but in some locales, the
utf8 sequence to represent the preferred character is more than one
character.

utf8 is great for simply storing and copying strings, but it will fail
quickly if you try to do any character-level direct manipulation on it
without outside help.
...
I suppose you still have to convert whenever you call one of the 
OpenFileW commands. And probably that is what all this is about. Someone 
feels that everything should be handled in the "native" format (which on 
Win32 is some sort of wchar_t, and on other platforms is char (though a 
UTF-8 char)).
My personal vote is to have the library convert to whatever internal 
representation is considered "preferred", and then have the convenience 
functions for converting to whatever the user wants. (native_file_wstring).
I agree.  I think the interface should have both narrow and wide
versions, provided was normal functions without templates or other
character polymorphism.  On operating systems that only use char, we can
do the same conversion that std::wcout presently does on these systems.
 On operating systems such as Win32 that have the unique ability to take
both narrow and wide operands natively, no conversion will be necessary.

I don't think this will do the wrong thing in any reasonable case.

Aaron W. LaFramboise