On Mon, Jun 12, 2017 at 9:51 PM, Peter Dimov via Boost < boost@lists.boost.org> wrote:
... whereas under Windows if invalid UTF-8 is allowed many different byte sequences may map to the same file name.
This is a false presumption. Nobody here proposes allowing absolutely ANY byte sequences, only using WTF-8 as means of guaranteeing a round-trip. And as far as WTF-8 goes there is a unique representation for every 16-bit codeunit sequence.
With all that said, I don't quite see the concern with WTF-8. What's the attack we're defending from by disallowing it?
There are some concerns with WTF-8, specifically if you concatenate two WTF-8 strings where one ends in an unpaired surrogate whereas the other begins with one, then the result is an invalid WTF-8 string. Filenames are usually parsed and concatenated on ASCII separators, so I don't see a problem in the typical use-case. As for the non-typical use cases, I would argue that they are beyond the responsibility of this library. -- Yakov Galka http://stannum.co.il/