Yakov Galka wrote:
Yeah, so? I say that the library can provide a Windows -> std::string -> Windows roundtrip just as it does with any other platform. If FreeBSD -> std::string conversion can return invalid UTF-8, then so does Windows -> std::string conversion.
The security concern here is that under FreeBSD the file name is what it is, and different byte sequences refer to different names, whereas under Windows if invalid UTF-8 is allowed many different byte sequences may map to the same file name. This does not necessarily preclude handling free surrogate pairs though. In practice the main problem is probably with overlong encoding of '.', '/' and '\'. Last time this came up I argued that if you rely on finding '.' as the literal 8 bit '.' your input validation is wrong anyway, but requiring strictly valid UTF-8 is a reasonable first line of defense. And realistically, if you want to validate the input in the correct manner, you have to #ifdef for each OS anyway, which kind of makes the library redundant. So in the specific use case where you _do_ use the library to avoid #ifdef'ing, it does make sense for it to protect you from invalid UTF-8 on Windows. With all that said, I don't quite see the concern with WTF-8. What's the attack we're defending from by disallowing it?