Re: [boost] What happened to Boost.Nowide?

7 Jan 2020

      On 7/01/2020 14:58, Yakov Galka wrote:
...
...
So, while unfortunate, v3 made the correct choice.  Paths have to be
kept in their original encoding between original source (command line,
file, or UI) and file API usage, otherwise you can get weird errors when
transcoding produces a different byte sequence that appears identical
when actually rendered, but doesn't match the filesystem.  Transcoding
is only safe when you're going to do something with the string other
than using it in a file API.
See above, malformed UTF-16 can be converted to WTF-8 (a UTF-8 superset)
and back losslessly. The unprecedented introduction of a platform specific
interface into the standard was, still is, and will always be, a horrible
mistake.
Given that WTF-8 is not itself supported by the C++ standard library 
(and the other formats are), that doesn't seem like a valid argument. 
You'd have to campaign for that to be added first.

The main problem though is that once you start allowing transcoding of 
any kind, it's a slippery slope to other conversions that can make lossy 
changes (such as applying different canonicalisation formats, or 
adding/removing layout codepoints such as RTL markers).

Also, if you read the WTF-8 spec, it notes that it is not legal to 
directly concatenate two WTF-8 strings (you either have to convert it 
back to UCS-16 first, or execute some special handling for the trailing 
characters of the first string), which immediately renders it a poor 
choice for a path storage format.  And indeed a poor choice for any 
purpose.  (I suspect many people who are using it have conveniently 
forgotten that part.)

Although on a related note, I think C++11/17 dropped the ball a bit on 
the new encoding-specific character types.  It's definitely an 
improvement on the prior method, but it would have been better to do 
something like:

     struct ansi_encoding_t;
     struct utf_encoding_t;
     typedef encoded_char<ansi_encoding_t, 8> char_t;
     typedef encoded_char<utf_encoding_t, 8> char8_t;
     typedef encoded_char<utf_encoding_t, 16> char16_t;

Where "encoded_char<E,N>" has storage size equal to N bits (blittable, 
and otherwise behaves like a standard integer type) but also carries 
around an arbitrary encoding tag type E.  This could be used to 
distinguish "a string encoded in UTF-8" from "a string encoded in WTF-8" 
or "a string encoded in EDBDIC".  And supplemental libraries could 
define additional encodings and conversion functions, and algorithms 
could operate on generic strings of any encoding.

Re: [boost] What happened to Boost.Nowide?

Gavin Lambert