Re: [boost] What happened to Boost.Nowide?

8 Jan 2020

      On 8/01/2020 12:57, Yakov Galka wrote:
...
Paths are, almost always, concatenated with ASCII separators (or other
valid strings) in-between. Even when concatenating malformed strings
directly, the issue isn't there if the result is passed immediately back to
the "UTF-16" system.
But the conversion from WTF-8 to UCS-16 can interpret the joining point 
as a different character, resulting in a different sequence.  Unless 
I've misread something, this could occur if the first string ended in an 
unpaired high surrogate and the second started with an unpaired low 
surrogate (or rather the WTF-8 equivalents thereof).  Unlikely, perhaps, 
but not impossible.
...
...
Although on a related note, I think C++11/17 dropped the ball a bit on
the new encoding-specific character types.  [...]
C++11 over-engineered it, and you keep over-engineering it even further.
Can't think of a time anybody had to mix ASCII, UTF-8, WTF-8 and EBCDIC
strings in one program *at compile time*.
You've just suggested cases where apps will contain both UTF-8 and 
WTF-8, which would be useful to distinguish between at compile time -- 
both to allow overloading to automatically select the correct conversion 
function and to give you compile errors if you accidentally try to pass 
a WTF-8 string to a function that expects pure UTF-8, or vice versa.

The same applies for other cases.  That's why C++20 introduced char8_t, 
so that you wouldn't accidentally pass UTF-8 strings to methods 
expecting other char formats.

This could even be extended to other forms of two-way data encoding, 
such as UUEncoding or Base64.  I don't think that's over-engineering, 
that's just basic data conversion and type safety.

Re: [boost] What happened to Boost.Nowide?

Gavin Lambert