Re: [boost] [review] Review of Nowide (Unicode) starts today

12 Jun 2017


      On Mon, Jun 12, 2017 at 9:51 PM, Peter Dimov via Boost <
boost@lists.boost.org> wrote:
...
... whereas under Windows if invalid UTF-8 is allowed many different byte
sequences may map to the same file name.
This is a false presumption. Nobody here proposes allowing absolutely ANY
byte sequences, only using WTF-8 as means of guaranteeing a round-trip. And
as far as WTF-8 goes there is a unique representation for every 16-bit
codeunit sequence.
...
With all that said, I don't quite see the concern with WTF-8. What's the
attack we're defending from by disallowing it?
There are some concerns with WTF-8, specifically if you concatenate two
WTF-8 strings where one ends in an unpaired surrogate whereas the other
begins with one, then the result is an invalid WTF-8 string. Filenames are
usually parsed and concatenated on ASCII separators, so I don't see a
problem in the typical use-case. As for the non-typical use cases, I would
argue that they are beyond the responsibility of this library.

-- 
Yakov Galka
http://stannum.co.il/