On Tue, Dec 3, 2019 at 2:19 PM Gavin Lambert via Boost < boost@lists.boost.org> wrote:
While I agree in principle, the simple fact is that performing string transcoding on filesystem paths is a Very Bad Idea™, since both Windows and Linux treat them as opaque byte sequences -- but Windows' native encoding is UTF-16 and Linux' is (mostly) UTF-8.
Unix paths can be stored in a narrow string already, where fopen() always magically worked for any text. Windows paths can be transcoded losslessy into WTF-8 and back. So, while unfortunate, v3 made the correct choice. Paths have to be
kept in their original encoding between original source (command line, file, or UI) and file API usage, otherwise you can get weird errors when transcoding produces a different byte sequence that appears identical when actually rendered, but doesn't match the filesystem. Transcoding is only safe when you're going to do something with the string other than using it in a file API.
See above, malformed UTF-16 can be converted to WTF-8 (a UTF-8 superset) and back losslessly. The unprecedented introduction of a platform specific interface into the standard was, still is, and will always be, a horrible mistake.
While copying is unfortunate, these things are rarely on a performance-critical path, and the benefits of having consistent compose/decompose operations on paths vastly outweighs that, in my opinion. Combined with the need to maintain native encoding for paths, separated algorithms don't seem particularly useful -- just less convenient to use.
The path parsing and modification functions could be storage agnostic. Some prefer the x.join(y) syntax over join(x,y), but that's just a preference originating from the OOP crowd. -- Yakov Galka http://stannum.co.il/