Re: [boost] [review] Review of Nowide (Unicode) starts today

19 Jun 2017

      On Mon, Jun 19, 2017 at 8:19 PM, Zach Laine via Boost
<boost@lists.boost.org> wrote:
...
On Sun, Jun 11, 2017 at 11:20 PM, Frédéric Bron via Boost <
boost@lists.boost.org> wrote:
...
- What is your evaluation of the design?
1) I'd really much rather have an iterator-based interface for the
narrow/wide conversions.  There's an existing set of iterators in
Boost.Regex already, and I've recently written one here:
https://github.com/tzlaine/text/blob/master/boost/text/utf8.hpp
As well in Boost.Locale - Boost.Nowide uses UTF conversions
from header only part of Boost.Locale.
...
The reliance on a new custom string type is a substantial mistake, IMO
(boost::nowide::basic_stackstring).
[snip]
(possibly cribbing the one of the two implementations above) would negate
the need for this new string type -- I could use the existing std::string,
MyString, QString, a char buffer, or whatever.  Also, I'd greatly prefer
that the new interfaces be defined in terms of string_view instead of
string/basic_stackstring (there's also a string_view implementation already
Boost.Utility).  string_view is simply far more usable, since it binds
effortlessly to either a char const * or a string.
stackstring is barely on-stack buffer optimization - it isn't general string
and never intended to be so.

It isn't in details because it can actually be useful outside library scope.
...
Providing an iterator interface
There is no iterator interface in communication with OS
so no sense to add it there. If you want character by character
interface it exists in Boost.Locale
...
2) I don't really understand what happens when a user passes a valid
Windows filename that is *invalid* UTF-16 to a program using Nowide.  Is
the invalid UTF-16 filename just broken in the process of trying to convert
it to UTF-8?  This is partially a documentation problem, but until I
understand how this is intended to work, I'm also counting it as a design
issue.
You are right it wasn't clear in the docs. My policy was return a error in
case of invalid encoding but after comments I received I'll rather
replace invalid sequences with U-FFFD Replacement Character
since it was what WinAPI usually does.

https://github.com/artyom-beilis/nowide/issues/16

It is something easy to fix.
...
...
- What is your evaluation of the documentation?
I think the documentation needs a bit of work.  The non-reference portion
is quite thin, and drilling down into the reference did not answer at least
one question I had (the one above, about invalid UTF-16):
Actually documentation points out that in case of invalid character
encoding a error is returned and how (in most of places - but
there are some missies)

However I agree that it can be improved
and a special section regarding bad encoding conversion policy
should and will be added.
...
It tells me nothing about what happens when invalid UTF-16 is encountered.
Is there an exception?  Is 0xfffd inserted?  If the latter, am I just
stuck?  I should not have to read any source code to figure this out, but
it looks like I have to.
Valid point.
...
...
And most importantly:
- Do you think the library should be accepted as a Boost library? Be
sure to say this explicitly so that your other comments don't obscure
your overall opinion.
I do not think the library should be accepted in its current form.  It
seems not to handle malformed UTF-16, which is a requirement for processing
Windows file names (as I understand it -- please correct this if I'm
wrong).
I strongly disagree about it. Same as converting invalid ANSI encoding
shouldn't lead to invalid UTF-16 or the other way around.

Since there is no way to present invalid UTF-16 using valid UTF-8
same as there is no way to do it in ANSI<->UTF-16 there is
no non-ambiguous way to provide such a support.
...
Zach
Thank You for your review.

Artyom

Re: [boost] [review] Review of Nowide (Unicode) starts today

Artyom Beilis