Beman Dawes wrote:
IMO, a critical aspect of all of those, including utf-8 to utf-8, is that they detect all utf-8 errors since ill-formed utf-8 is used as an attack vector.
That is what I alluded to earlier with my bikeshedding comment - I personally find this policy a bit too firm for my taste. Sure, sometimes I do want to reject any invalid UTF-8 with extreme prejudice, but at other times I do not. For instance, when I get a Windows file name, it can well be invalid UTF-16, which when converted will become invalid UTF-8 but which will roundtrip correctly back to its original invalid UTF-16 form and refer to the same file. That's why things like CESU-8 or WTF-8 exist. So I like the "method" argument of locale::conv::utf_to_utf, except that I think that it doesn't offer enough control.