Re: [boost] [review] Review of Nowide (Unicode) starts today

12 Jun 2017

      On Mon, Jun 12, 2017 at 6:05 PM, Vadim Zeitlin via Boost
<boost@lists.boost.org> wrote:
...
On Mon, 12 Jun 2017 17:58:32 +0300 Artyom Beilis via Boost <boost@lists.boost.org> wrote:
AB> By definition: you can't handle file names that can't be represented
AB> in UTF-8 as there is no valid UTF-8 representation exist.
This is a nice principle to have in theory, but very unfortunate in
practice because at least under Unix systems such file names do occur in
the wild (maybe less often now than 10 years ago, when UTF-8 was less
ubiquitous, but it's still hard to believe that the problem has completely
disappeared). And there are ways to solve it, e.g. I think glib represents
such file names using special characters from a PUA and there are other
possible approaches, even if, admittedly, none of them is perfect.
Please note: Under POSIX platforms no conversions are performed
and no UTF-8 validation is done as this is incorrect:

http://cppcms.com/files/nowide/html/index.html#qna

The only case is when Windows Wide API returns/creates
invalid UTF-16 - which can happen only when invalid surrogate
UTF-16 pairs are generated - and they have no valid UTF-8
representation.

On the other hand creating deliberately invalid UTF-8 is very problematic idea.

Regards,
Artyom