Re: [boost] [review] Review of Nowide (Unicode) starts today

12 Jun 2017


      On Mon, Jun 12, 2017 at 12:20 PM, Groke, Paul via Boost <
boost@lists.boost.org> wrote:
...
I know modified UTF-8 is (can be) invalid UTF-8, that's why I asked. I
think it could make sense to support it anyway though. Round tripping
(strictly invalid, but possible) file names on Windows, easier
interoperability with stuff like JNI, ...
Don't you mean WTF-8 then? AFAIK "Modified UTF-8" is UTF-8 that encodes the
null character with an overlong sequence, and thus is incompatible with
standard UTF-8, unlike WTF-8 which is a compatible extension.
...
OTOH it would add overhead for systems with native UTF-8 APIs, because
Nowide would at least have to check every string for "modified UTF-8
encoded" surrogate pairs and convert the string if necessary. Which of
course is a good argument for not supporting modified UTF-8, because then
Nowide could just pass the strings through unmodified on those systems.
Implementing WTF-8 removes a check in UTF-8 → UTF-16 conversion, and
doesn't change anything in the reverse direction when there is a valid
UTF-16. I suspect it isn't slower.

-- 
Yakov Galka
http://stannum.co.il/