On Mon, Jun 19, 2017 at 8:19 PM, Zach Laine via Boost
On Sun, Jun 11, 2017 at 11:20 PM, Frédéric Bron via Boost < boost@lists.boost.org> wrote:
- What is your evaluation of the design?
1) I'd really much rather have an iterator-based interface for the narrow/wide conversions. There's an existing set of iterators in Boost.Regex already, and I've recently written one here:
https://github.com/tzlaine/text/blob/master/boost/text/utf8.hpp
As well in Boost.Locale - Boost.Nowide uses UTF conversions from header only part of Boost.Locale.
The reliance on a new custom string type is a substantial mistake, IMO (boost::nowide::basic_stackstring). [snip] (possibly cribbing the one of the two implementations above) would negate the need for this new string type -- I could use the existing std::string, MyString, QString, a char buffer, or whatever. Also, I'd greatly prefer that the new interfaces be defined in terms of string_view instead of string/basic_stackstring (there's also a string_view implementation already Boost.Utility). string_view is simply far more usable, since it binds effortlessly to either a char const * or a string.
stackstring is barely on-stack buffer optimization - it isn't general string and never intended to be so. It isn't in details because it can actually be useful outside library scope.
Providing an iterator interface
There is no iterator interface in communication with OS so no sense to add it there. If you want character by character interface it exists in Boost.Locale
2) I don't really understand what happens when a user passes a valid Windows filename that is *invalid* UTF-16 to a program using Nowide. Is the invalid UTF-16 filename just broken in the process of trying to convert it to UTF-8? This is partially a documentation problem, but until I understand how this is intended to work, I'm also counting it as a design issue.
You are right it wasn't clear in the docs. My policy was return a error in case of invalid encoding but after comments I received I'll rather replace invalid sequences with U-FFFD Replacement Character since it was what WinAPI usually does. https://github.com/artyom-beilis/nowide/issues/16 It is something easy to fix.
- What is your evaluation of the documentation?
I think the documentation needs a bit of work. The non-reference portion is quite thin, and drilling down into the reference did not answer at least one question I had (the one above, about invalid UTF-16):
Actually documentation points out that in case of invalid character encoding a error is returned and how (in most of places - but there are some missies) However I agree that it can be improved and a special section regarding bad encoding conversion policy should and will be added.
It tells me nothing about what happens when invalid UTF-16 is encountered. Is there an exception? Is 0xfffd inserted? If the latter, am I just stuck? I should not have to read any source code to figure this out, but it looks like I have to.
Valid point.
And most importantly: - Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion.
I do not think the library should be accepted in its current form. It seems not to handle malformed UTF-16, which is a requirement for processing Windows file names (as I understand it -- please correct this if I'm wrong).
I strongly disagree about it. Same as converting invalid ANSI encoding shouldn't lead to invalid UTF-16 or the other way around. Since there is no way to present invalid UTF-16 using valid UTF-8 same as there is no way to do it in ANSI<->UTF-16 there is no non-ambiguous way to provide such a support.
Zach
Thank You for your review. Artyom