On Mon, Jun 12, 2017 at 12:55 PM, Groke, Paul via Boost < boost@lists.boost.org> wrote:
Supporting modified UTF-8 or WTF-8 adds overhead on systems where the native OS API accepts UTF-8, but only strictly valid UTF-8. When some UTF-8 enabled function of the library is called on such a system, it would have to check for WTF-8 encoded surrogates and convert them to "true" UTF-8 before passing the string to the OS API. Because you would expect and want the "normal" UTF-8 encoding for a string to refer to the same file as the WTF-8 encoding of the same string.
That's the point: if the string is at all representable in UTF-8 then its WTF-8 representation is already in a valid UTF-8 representation and no conversion has to be done. Thus you don't even have to check anything at all. This is how WTF-8 is 'more compatible' with UTF-8 than Modified UTF-8 is. By analogy, you don't need to do special checks if you want to pass a UTF-8 string to an ASCII only API, because UTF-8 is a strict superset. -- Yakov Galka http://stannum.co.il/