Re: [boost] [review] Review of Nowide (Unicode)
Dear all, Here is my review of the library. - What is your evaluation of the design?
I welcome the UTF-8 approach on Window. However, the fact that Nowide does not handle invalid UTF-16 transparently is an issue that has to be addressed. - What is your evaluation of the implementation?
Going through the code I can't spot any implementation issues.
- What is your evaluation of the documentation?
It's clear. No issues found.
- What is your evaluation of the potential usefulness of the library?
I already use this approach through a different implementation. Even though I wouldn't bring Boost just for that single library, for projects that already use Boost it might be a valuable addition. I do think that Nowide offers a superior solution to encoding issues compared to Boost.Filesystem.
- Did you try to use the library? With what compiler? Did you have any problems?
I did try it years ago when it was first submitted to Boost. I didn't try the recent version, so NO.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
30 minutes reading the code + mailing list discussions.
- Are you knowledgeable about the problem domain?
I rate myself as an expert. I'm a co-author of utf8everywhere.org. DISCLAIMER: I met Artyom a couple times IRL.
- Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion.
Conditional acceptance, subject to transparent invalid UTF-16 resolution. -- Yakov Galka http://stannum.co.il/
I welcome the UTF-8 approach on Window. However, the fact that Nowide does not handle invalid UTF-16 transparently is an issue that has to be addressed.
[snip]
- What is your evaluation of the potential usefulness of the library?
I already use this approach through a different implementation. Even though I wouldn't bring Boost just for that single library, for projects that already use Boost it might be a valuable addition.
Actually there is standalone version out of boost scope (see docs).
- Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion.
Conditional acceptance, subject to transparent invalid UTF-16 resolution.
-- Yakov Galka http://stannum.co.il/
After long discussions on this list the following updated policy will be applied to nowide: 1. Conversion will always lead to **valid UTF-8/UTF-16** regardless validity of the source unlike the current status that returns error/creates error status. 2. Instead of failing the conversion and returning an error the invalid characters will be replaced with U-FFFD - Replacement Character - similar to behavior of WinAPI. So you will not get Invalid UTF-16 <- Quazy UTF-8 -> Invalid UTF-16 path but you will be able to complete the path as: Invalid UTF-16 <- Valid UTF-8 with substitutions -> Valid UTF-16 I hope it complies with your needs Artyom
On 21/06/2017 08:13, Artyom Beilis wrote:
1. Conversion will always lead to **valid UTF-8/UTF-16** regardless validity of the source unlike the current status that returns error/creates error status.
2. Instead of failing the conversion and returning an error the invalid characters will be replaced with U-FFFD - Replacement Character - similar to behavior of WinAPI.
So you will not get Invalid UTF-16 <- Quazy UTF-8 -> Invalid UTF-16 path but you will be able to complete the path as: Invalid UTF-16 <- Valid UTF-8 with substitutions -> Valid UTF-16
Isn't the problem case where you get an arbitrary-block-of-bytes (UTF-8-ish in POSIX and UTF-16-ish in Windows) filename from some other API (eg. readdir), convert to really-UTF-8 for internal use (eg. manipulation, display), and then go back to the OS to try to actually use that filename and get an unexpected "file not found" because it didn't round-trip? I don't know if there is a good solution for this other than never converting any paths and always working in the native encoding of the OS, though.
21.06.2017 3:42, Gavin Lambert via Boost пишет:
Isn't the problem case where you get an arbitrary-block-of-bytes (UTF-8-ish in POSIX and UTF-16-ish in Windows) filename from some other API (eg. readdir), convert to really-UTF-8 for internal use (eg. manipulation, display), and then go back to the OS to try to actually use that filename and get an unexpected "file not found" because it didn't round-trip? Please note that even WinApi is unable to handle some malformed filenames correctly (the ones with \0 in the middle, for example). So, I believe Nowide shouldn't try to work with them. I think only chkdsk/fsck should deal with files that have malformed names. For other tools, it is ok to fail on invalid utf16.
-- Best regards, Sergey Cheban
participants (4)
-
Artyom Beilis
-
Gavin Lambert
-
Sergey Cheban
-
Yakov Galka