Re: [boost] [review] Review of Nowide (Unicode) starts today

12 Jun 2017

      On Mon, Jun 12, 2017 at 12:55 PM, Groke, Paul via Boost <
boost@lists.boost.org> wrote:
...
Supporting modified UTF-8 or WTF-8 adds overhead on systems where the
native OS API accepts UTF-8, but only strictly valid UTF-8.
When some UTF-8 enabled function of the library is called on such a
system, it would have to check for WTF-8 encoded surrogates and
convert them to "true" UTF-8 before passing the string to the OS API.
Because you would expect and want the "normal" UTF-8 encoding for
a string to refer to the same file as the WTF-8 encoding of the same
string.
That's the point: if the string is at all representable in UTF-8 then its
WTF-8 representation is already in a valid UTF-8 representation and no
conversion has to be done. Thus you don't even have to check anything at
all. This is how WTF-8 is 'more compatible' with UTF-8 than Modified UTF-8
is.

By analogy, you don't need to do special checks if you want to pass a UTF-8
string to an ASCII only API, because UTF-8 is a strict superset.

-- 
Yakov Galka
http://stannum.co.il/