On Mon, Jun 12, 2017 at 6:05 PM, Vadim Zeitlin via Boost
On Mon, 12 Jun 2017 17:58:32 +0300 Artyom Beilis via Boost
wrote: AB> By definition: you can't handle file names that can't be represented AB> in UTF-8 as there is no valid UTF-8 representation exist.
This is a nice principle to have in theory, but very unfortunate in practice because at least under Unix systems such file names do occur in the wild (maybe less often now than 10 years ago, when UTF-8 was less ubiquitous, but it's still hard to believe that the problem has completely disappeared). And there are ways to solve it, e.g. I think glib represents such file names using special characters from a PUA and there are other possible approaches, even if, admittedly, none of them is perfect.
Please note: Under POSIX platforms no conversions are performed and no UTF-8 validation is done as this is incorrect: http://cppcms.com/files/nowide/html/index.html#qna The only case is when Windows Wide API returns/creates invalid UTF-16 - which can happen only when invalid surrogate UTF-16 pairs are generated - and they have no valid UTF-8 representation. On the other hand creating deliberately invalid UTF-8 is very problematic idea. Regards, Artyom