Question to all:
Why should we try to handle wrong UTF-16 (or wrong UTF-8)?
1. such files should not exist
2. if they exist, why?
- if it is because it is an old file, can the user just rename it properly?
- if it is because it was produced by a program, why should this
program continue to work without fixing? Isn't it the best way that we
get wrong filenames forever?
I do not understand why we cannot just issue an error.
Thanks for explanations,
Frédéric
2017-06-19 19:19 GMT+02:00 Zach Laine via Boost
On Sun, Jun 11, 2017 at 11:20 PM, Frédéric Bron via Boost < boost@lists.boost.org> wrote:
Hi Everyone,
The formal review of Artyom Beilis' Nowide library starts today and will last until Wed. 21st of June. [snip] Please post your comments and review to the boost mailing list (preferably), or privately to the Review Manager (to me ;-). Here are some questions you might want to answer in your review:
- What is your evaluation of the design?
1) I'd really much rather have an iterator-based interface for the narrow/wide conversions. There's an existing set of iterators in Boost.Regex already, and I've recently written one here:
https://github.com/tzlaine/text/blob/master/boost/text/utf8.hpp
The reliance on a new custom string type is a substantial mistake, IMO (boost::nowide::basic_stackstring). Providing an iterator interface (possibly cribbing the one of the two implementations above) would negate the need for this new string type -- I could use the existing std::string, MyString, QString, a char buffer, or whatever. Also, I'd greatly prefer that the new interfaces be defined in terms of string_view instead of string/basic_stackstring (there's also a string_view implementation already Boost.Utility). string_view is simply far more usable, since it binds effortlessly to either a char const * or a string.
2) I don't really understand what happens when a user passes a valid Windows filename that is *invalid* UTF-16 to a program using Nowide. Is the invalid UTF-16 filename just broken in the process of trying to convert it to UTF-8? This is partially a documentation problem, but until I understand how this is intended to work, I'm also counting it as a design issue.
- What is your evaluation of the implementation?
I did not look.
- What is your evaluation of the documentation?
I think the documentation needs a bit of work. The non-reference portion is quite thin, and drilling down into the reference did not answer at least one question I had (the one above, about invalid UTF-16):
Looking at some example code in the "Using the Library" section, I saw this:
" To make this program handle Unicode properly, we do the following changes:
#include
#include #include int main(int argc,char **argv) { boost::nowide::args a(argc,argv); // Fix arguments - make them UTF-8 " Ok, so I clicked "boost::nowide::args", hoping for an answer. The detailed description for args says:
" args is a class that fixes standard main() function arguments and changes them to UTF-8 under Microsoft Windows.
The class uses GetCommandLineW(), CommandLineToArgvW() and GetEnvironmentStringsW() in order to obtain the information. It does not relates to actual values of argc,argv and env under Windows.
It restores the original values in its destructor "
It tells me nothing about what happens when invalid UTF-16 is encountered. Is there an exception? Is 0xfffd inserted? If the latter, am I just stuck? I should not have to read any source code to figure this out, but it looks like I have to.
This criticism can be applied to most of the documentation. My preference is that the semantics of primary functionality of the library should be explained in tutorials or other non-reference formats. The current state of the docs doesn't even explain things in the references. This must be fixed before this library can be accepted.
- What is your evaluation of the potential usefulness of the library?
I think this library is attempting to address a real and important issue. I just can't figure out if it's a complete solution, because how invalid UTF-16 is treated remains a question.
- Did you try to use the library? With what compiler? Did you have any
problems?
I did not.
- How much effort did you put into your evaluation? A glance? A quick
reading? In-depth study?
A quick reading, plus a bit of discussion on the list.
- Are you knowledgeable about the problem domain?
I understand the UTF-8 issues reasonably well, but am ignorant of the Windows-specific issues.
And most importantly: - Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion.
I do not think the library should be accepted in its current form. It seems not to handle malformed UTF-16, which is a requirement for processing Windows file names (as I understand it -- please correct this if I'm wrong). Independent of this, I don't find the docs to be sufficient.
Zach
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Frédéric Bron ----------------------------------------------------------- Frédéric Bron (frederic.bron@m4x.org) Villa des 4 chemins, Centre Hospitalier, BP 208 38506 Voiron Cedex tél. fixe : +33 4 76 67 17 27, tél. port.: +33 6 67 02 77 35