[review] Boost.Nowide is Accepted into Boost
Dear all,
I would like to thank all the people who took part in the discussions
during the review. All contributors admitted that Boost.Nowide
addresses a real issue and solves it, at least partially. Moreover we
received 8 official reviews with 6 “accept”, 1 “do not accept as is”
and 1 “I am not against it”.
So Boost.Nowide is accepted for inclusion in Boost.
Congratulation and many thanks to Artyom Beilis for this contribution.
However, 2 major issues arose in the discussions and they will have to
be fixed before inclusion:
1. handling of ill-formed Unicode input should be improved/clarified
2. the documentation needs improvements
In particular, this needs to be fixed:
* Design:
There was a lot of discussions on what to do with ill-formed UTF-16
strings on Windows, in particular coming from the file system while
converting them to narrow UTF-8 strings. Basically, 3 options:
1. allow the roundtrip ill formed UTF-16 > UTF-8 > ill formed UTF-16,
this is not possible with UTF-8 encoding, WTF-8 (not a standard) may
be an option with some issues
2. issue an error (today’s nowide option, even if the author would
like to move to 3.)
3. convert as much as possible to fully conformant UTF-8 with
character replacement U+FFFD for invalid input. This is what the NT
kernel does with functions like RtlUTF8ToUnicodeN. This would allow
cout to continue to work after invalid output instead of failing.
After reviewing what was chosen for std::filesystem in C++17, I think
3. is the right think to do:
- std::filesystem::path::u8string does not convert strings on Posix.
If a conversion is done (on Windows) the output is fully conformant
UTF-8.
- the roundtrip is not guaranteed in std::filesystem::path conversions
(implementation defined) but the roundtrip of a converted paths is
guaranteed to work which is the case with Boost.Nowide: once in fully
conformant UTF-8, narrow > wide > narrow works fine.
- the use of the replacement character U+FFFD is what is done by the
NT kernel (question: do you have to reimplement RtlUTF8ToUnicodeN and
RtlUnicodeToUTF8N, why not using them)?
It should been made clear in the doc that this choice does not
guarantees the roundtrip on Windows so that ill-formed filename may
not be opened.
* Documentation:
- the main part of the doc (the non-reference part) should be more detailed
- it should be clear to the reader that Windows and Posix platforms
are not handled the same; on Windows, we get UTF-8 strings while on
Posix platforms, the user should not expect UTF-8 encoding, just
narrow strings, even if UTF-8 is today the most common encoding
(addressed in Q/A but is probably too far away). It must be clear from
the beginning that on Posix platforms, the library does nothing
(including no checking of UTF-8 conformance), just forward to the
standard library.
- review the use of UTF-8 so that it do not let think that narrow
strings are always UTF-8 as this is not necessary the case on Posix
platform; maybe just use narrow string?
- clarify what is converted to what and when: for each function, we
need to know precisely what is the output for all possible input; in
particular, better explain for each function what the library does
with ill-formed input (either UTF-16 input or narrow input), in
particular for args and getenv; say clearly what happens (failbit,
error, exception, invalid character…)
- clarify what is done for Windows and what is (not) done for Posix
- explain better what is behind nowide::cout, cin, cerr, what is done?
- clarify what does the filesystem integration on Windows vs Posix
- check if basic_stackstring::swap needs to swap buffer_ when both
strings are on the stack
- The doc/ directory is missing a Jamfile, and there's also no
meta/libraries.json.
* Implementation:
- The global BOOST_USE_WINDOWS_H macro should probably be respected.
When windows.h is not used, the WinAPI prototypes are declared in the
global namespace. This can cause issues when windows.h is also
included before/after the Nowide headers, and the prototypes don't
match exactly (e.g. short vs. wchar_t). Also it pollutes the global
namespace. Nowide should do what other Boost libraries do, and declare
the prototypes in a private detail namespace instead.
Example in
On Fri, Jun 23, 2017 at 4:34 PM, Frédéric Bron
Dear all,
I would like to thank all the people who took part in the discussions during the review. All contributors admitted that Boost.Nowide addresses a real issue and solves it, at least partially. Moreover we received 8 official reviews with 6 “accept”, 1 “do not accept as is” and 1 “I am not against it”.
So Boost.Nowide is accepted for inclusion in Boost. Congratulation and many thanks to Artyom Beilis for this contribution.
I would like to thank all the reviewers who provided very valuable and important comments regarding the library and brought very important notes that would improve the Boost.Nowide library. I want also to express my deepest gratitude to Frédéric Bron who managed the review, collected all the data regarding the library, did a great code review on his own and finally created an excellent summary of all issues that were brought in this review. THANK YOU!
However, 2 major issues arose in the discussions and they will have to be fixed before inclusion:
1. handling of ill-formed Unicode input should be improved/clarified 2. the documentation needs improvements
In particular, this needs to be fixed: [snip]
Now its time to get my hands on the code :-) Once again thank you very much! Artyom Beilis
participants (3)
-
Artyom Beilis
-
Frédéric Bron
-
Peter Dimov