On 12 June 2017 at 17:57, Peter Dimov via Boost
degski wrote:
Question: "Shouldn't the passing of invalid UTF-8/16 sequences be defined
as UB?"
Of course not. Why would one need to use the library then? It defeats the whole purpose of it.
From WP (read up on it now): "RFC 3629 states "Implementations of the decoding algorithm MUST protect against decoding invalid sequences."[13] https://en.wikipedia.org/wiki/UTF-8#cite_note-rfc3629-13 *The Unicode Standard* requires decoders to "...treat any ill-formed code unit sequence as an error condition. This guarantees that it will neither interpret nor emit an ill-formed code unit sequence.""
So not UB then, but it should not pass either. Are we talking FAT32 or NTFS? What Windows verions are affected? I also think, as some posters below (and in another thread) state, that Windows should not be treated differently. A new boost library should not accomodate bad/sloppy windows' historic quirks. The library *can* require that's it's use depends on the system and its' users adhere to the standard. Then WP on Overlong encodings: "The standard specifies that the correct encoding of a code point use only the minimum number of bytes required to hold the significant bits of the code point. Longer encodings are called *overlong* and are not valid UTF-8 representations of the code point. This rule maintains a one-to-one correspondence between code points and their valid encodings, so that there is a unique valid encoding for each code point." The key being: "... are not valid UTF-8 representations ...", i.e. we're back to the case above. degski WP: https://en.wikipedia.org/wiki/UTF-8 -- "*Ihre sogenannte Religion wirkt bloß wie ein Opiat reizend, betäubend, Schmerzen aus Schwäche stillend.*" - Novalis 1798