On Mon, Jun 12, 2017 at 12:20 PM, Groke, Paul via Boost < boost@lists.boost.org> wrote:
I know modified UTF-8 is (can be) invalid UTF-8, that's why I asked. I think it could make sense to support it anyway though. Round tripping (strictly invalid, but possible) file names on Windows, easier interoperability with stuff like JNI, ...
Don't you mean WTF-8 then? AFAIK "Modified UTF-8" is UTF-8 that encodes the null character with an overlong sequence, and thus is incompatible with standard UTF-8, unlike WTF-8 which is a compatible extension.
OTOH it would add overhead for systems with native UTF-8 APIs, because Nowide would at least have to check every string for "modified UTF-8 encoded" surrogate pairs and convert the string if necessary. Which of course is a good argument for not supporting modified UTF-8, because then Nowide could just pass the strings through unmodified on those systems.
Implementing WTF-8 removes a check in UTF-8 → UTF-16 conversion, and doesn't change anything in the reverse direction when there is a valid UTF-16. I suspect it isn't slower. -- Yakov Galka http://stannum.co.il/