Re: [boost] [review] Review of Nowide (Unicode) starts today

13 Jun 2017

      On 12 June 2017 at 17:57, Peter Dimov via Boost <boost@lists.boost.org>
wrote:
...
degski wrote:
Question: "Shouldn't the passing of invalid UTF-8/16 sequences be defined
...
as UB?"
Of course not. Why would one need to use the library then? It defeats the
whole purpose of it.
...
From WP (read up on it now): "RFC 3629 states "Implementations of the
decoding algorithm MUST protect against decoding invalid sequences."[13]
<https://en.wikipedia.org/wiki/UTF-8#cite_note-rfc3629-13> *The Unicode
Standard* requires decoders to "...treat any ill-formed code unit sequence
as an error condition. This guarantees that it will neither interpret nor
emit an ill-formed code unit sequence.""
So not UB then, but it should not pass either.

Are we talking FAT32 or NTFS? What Windows verions are affected? I also
think, as some posters below (and in another thread) state, that Windows
should not be treated differently. A new boost library should not
accomodate bad/sloppy windows' historic quirks. The library *can* require
that's it's use depends on the system and its' users adhere to the standard.

Then WP on Overlong encodings: "The standard specifies that the correct
encoding of a code point use only the minimum number of bytes required to
hold the significant bits of the code point. Longer encodings are called
*overlong* and are not valid UTF-8 representations of the code point. This
rule maintains a one-to-one correspondence between code points and their
valid encodings, so that there is a unique valid encoding for each code
point."

The key being: "... are not valid UTF-8 representations ...", i.e. we're
back to the case above.

degski

WP: https://en.wikipedia.org/wiki/UTF-8
-- 
"*Ihre sogenannte Religion wirkt bloß wie ein Opiat reizend, betäubend,
Schmerzen aus Schwäche stillend.*" - Novalis 1798