On 10/8/15 7:54 AM, Artyom Beilis wrote:
----- Original Message -----
[BEGIN: Long description regarding <codecvt> ]
...
So... Boost community - please give yourself a favor Don't use <codecvt> unless you really understand what are you doing.
Well, I use <codecvt> and boost::utf8_codecvt and I definitely don't know what I'm doing. That (and the fact that I don't have any extra time) is the reason for using a library in first place. The whole, locale/facet/codecvt saga is long and very difficult to fathom. To make things worse it has a tortured history of library writers not getting it right. If one looks at the utf_codecvt facet there's lot's of workaround for older compilers and libraries. So it's high time this be rationalized. I think the concept has merit and would do well with a good library and educational documentation to match.
[END: Long description regarding <codecvt> ]
If you want to covert utf8 files properly to native wide character like for example for boost::filesystem,
boost::serialization or std::fstream you need to use facet that converts to utf-16 or utf-32 according to what wchar_t holds and <codecvt> does not provide one (without platform specific tricks)
I see that, but we could easily select which codecvt facet depending on the size of the wchar on the specific platform. I dislike libraries which do "too much" in order to "just" work. codecvt library should be a) A tool kit ot create codecvt facets b) some generated examples which will cover what most users need c) a bunch of tutorial information about how codecvt can be used - especially outside of stream i/o d) anything else which is useful. Note I'm aware that this is a huge task to do right - I certainly wouldn't blame anyone for not taking it on.
So I'm not going to implement C++11 <codecvt> because IMHO it is broken by design in first place.
Hmm - I'd have to think more about this. If <codecvt> is ill concieved - I'm sure one could propose an alternative.
Boost.Locale provides one but currently it is deep internal and complex part of library.
Hmmm - very interesting. Maybe it's a question of factoring out this part and repackaging it in a more digestible form. That would be interesting.
The code I written for Boost.Nowide or one I suggest to put into Boost.Locale header-only part is codecvt that converts between utf8 and utf-16/32 according to size of character:
boost::(nowide|or locale)::utf8_facet
- utf-8 to utf-16 (windows) utf-32 (posix) boost::(nowide|or locale)::utf8_facet - utf-8 to utf-16 on any platform boost::(nowide|or locale)::utf8_facet - utf-8 to utf-32 on any platform That's it. It isn't <codecvt> because C++11 <codecvt> does not actually do the job needed.
I'll have to take your word for it.
Artyom Beilis
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost