Re: [boost] [review] Review of Nowide (Unicode) starts today

19 Jun 2017

      Question to all:

Why should we try to handle wrong UTF-16 (or wrong UTF-8)?
1. such files should not exist
2. if they exist, why?
  - if it is because it is an old file, can the user just rename it properly?
  - if it is because it was produced by a program, why should this
program continue to work without fixing? Isn't it the best way that we
get wrong filenames forever?

I do not understand why we cannot just issue an error.

Thanks for explanations,

Frédéric

2017-06-19 19:19 GMT+02:00 Zach Laine via Boost <boost@lists.boost.org>:
...
On Sun, Jun 11, 2017 at 11:20 PM, Frédéric Bron via Boost <
boost@lists.boost.org> wrote:
...
Hi Everyone,
The formal review of Artyom Beilis' Nowide library starts today and
will last until Wed. 21st of June.
[snip]
Please post your comments and review to the boost mailing list
(preferably), or privately to the Review Manager (to me ;-). Here are
some questions you might want to answer in your review:
...
- What is your evaluation of the design?
1) I'd really much rather have an iterator-based interface for the
narrow/wide conversions.  There's an existing set of iterators in
Boost.Regex already, and I've recently written one here:
https://github.com/tzlaine/text/blob/master/boost/text/utf8.hpp
The reliance on a new custom string type is a substantial mistake, IMO
(boost::nowide::basic_stackstring).  Providing an iterator interface
(possibly cribbing the one of the two implementations above) would negate
the need for this new string type -- I could use the existing std::string,
MyString, QString, a char buffer, or whatever.  Also, I'd greatly prefer
that the new interfaces be defined in terms of string_view instead of
string/basic_stackstring (there's also a string_view implementation already
Boost.Utility).  string_view is simply far more usable, since it binds
effortlessly to either a char const * or a string.
2) I don't really understand what happens when a user passes a valid
Windows filename that is *invalid* UTF-16 to a program using Nowide.  Is
the invalid UTF-16 filename just broken in the process of trying to convert
it to UTF-8?  This is partially a documentation problem, but until I
understand how this is intended to work, I'm also counting it as a design
issue.
...
- What is your evaluation of the implementation?
I did not look.
...
- What is your evaluation of the documentation?
I think the documentation needs a bit of work.  The non-reference portion
is quite thin, and drilling down into the reference did not answer at least
one question I had (the one above, about invalid UTF-16):
Looking at some example code in the "Using the Library" section, I saw this:
"
To make this program handle Unicode properly, we do the following changes:
#include <boost/nowide/args.hpp>
#include <boost/nowide/fstream.hpp>
#include <boost/nowide/iostream.hpp>
int main(int argc,char **argv)
{
boost::nowide::args a(argc,argv); // Fix arguments - make them UTF-8
"
Ok, so I clicked "boost::nowide::args", hoping for an answer.  The detailed
description for args says:
"
args is a class that fixes standard main() function arguments and changes
them to UTF-8 under Microsoft Windows.
The class uses GetCommandLineW(), CommandLineToArgvW() and
GetEnvironmentStringsW() in order to obtain the information. It does not
relates to actual values of argc,argv and env under Windows.
It restores the original values in its destructor
"
It tells me nothing about what happens when invalid UTF-16 is encountered.
Is there an exception?  Is 0xfffd inserted?  If the latter, am I just
stuck?  I should not have to read any source code to figure this out, but
it looks like I have to.
This criticism can be applied to most of the documentation.  My preference
is that the semantics of primary functionality of the library should be
explained in tutorials or other non-reference formats.  The current state
of the docs doesn't even explain things in the references.  This must be
fixed before this library can be accepted.
...
- What is your evaluation of the potential usefulness of the library?
I think this library is attempting to address a real and important issue.
I just can't figure out if it's a complete solution, because how invalid
UTF-16 is treated remains a question.
...
- Did you try to use the library? With what compiler? Did you have any
...
problems?
I did not.
- How much effort did you put into your evaluation? A glance? A quick
...
reading? In-depth study?
A quick reading, plus a bit of discussion on the list.
...
- Are you knowledgeable about the problem domain?
I understand the UTF-8 issues reasonably well, but am ignorant of the
Windows-specific issues.
...
And most importantly:
- Do you think the library should be accepted as a Boost library? Be
sure to say this explicitly so that your other comments don't obscure
your overall opinion.
I do not think the library should be accepted in its current form.  It
seems not to handle malformed UTF-16, which is a requirement for processing
Windows file names (as I understand it -- please correct this if I'm
wrong).  Independent of this, I don't find the docs to be sufficient.
Zach
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- 
Frédéric Bron

-----------------------------------------------------------
Frédéric Bron (frederic.bron@m4x.org)
Villa des 4 chemins, Centre Hospitalier, BP 208
38506 Voiron Cedex
tél. fixe : +33 4 76 67 17 27, tél. port.: +33 6 67 02 77 35