Boost.Charconv review runs January 15th - 25th, 2024
Dear all, The review of Boost.Charconv by Matt Borland runs Monday, January 15th through January 25th, 2024. Code: https://github.com/cppalliance/charconvDocs: https://master.charconv.cpp.al/Review Schedule: https://www.boost.org/community/review_schedule.html Thank you for considering this review. Please optionally provide feedback on the followinggeneral topics: - What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain? Christopher KormanyosBoost.Charconv Review Manager
On Mon, Jan 15, 2024 at 2:31 AM Christopher Kormanyos via Boost < boost@lists.boost.org> wrote:
The review of Boost.Charconv by Matt Borland runs Monday, January 15th through January 25th, 2024.
Any C++ Alliance Staff Engineers involved with the review, whether as author, reviewer, or review manager, need to disclose their affiliation. Thanks
Docs state "non-allocating", yet grepping for malloc in the source code yields a couple of results in the from_chars implementation for floats. Could you please explain the reason behind this? Also, from the docs for from_chars for floating points: "On std::errc::result_out_of_range we return ±0 for small values (e.g. 1.0e-99999) or ±HUGE_VAL for large values (e.g. 1.0e+99999) to match the handling of std::strtod" What's the rationale for deviating from the standard here? Ruben.
wt., 16 sty 2024 o 20:53 Ruben Perez via Boost
Docs state "non-allocating", yet grepping for malloc in the source code yields a couple of results in the from_chars implementation for floats. Could you please explain the reason behind this?
Also, from the docs for from_chars for floating points:
"On std::errc::result_out_of_range we return ±0 for small values (e.g. 1.0e-99999) or ±HUGE_VAL for large values (e.g. 1.0e+99999) to match the handling of std::strtod"
What's the rationale for deviating from the standard here?
See https://github.com/cppalliance/charconv/issues/110 Regards, &rzej;
Ruben.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Docs state "non-allocating", yet grepping for malloc in the source code yields a couple of results in the from_chars implementation for floats. Could you please explain the reason behind this?
There are 2 uses of malloc. 1 is behind BOOST_CHARCONV_DEBUG used for printing 128-bit integer strings in ryu_generic_128. The other could in an extreme edge case be hit by the user. It is used in a fallback routine in case of unsuccessful 80 or 128 bit long double conversions in from_chars. If the conversion is unsuccessful on first attempt, and the string is more than 1024 bytes then we need to malloc to avoid a buffer overrun. The rationale for malloc is that it won't throw of failure.
Also, from the docs for from_chars for floating points:
"On std::errc::result_out_of_range we return ±0 for small values (e.g. 1.0e-99999) or ±HUGE_VAL for large values (e.g. 1.0e+99999) to match the handling of std::strtod"
What's the rationale for deviating from the standard here?
There is an open issue with LWG here: https://cplusplus.github.io/LWG/lwg-active.html#3081. The standard for <charconv> does not distinguish between underflow and overflow like strtod does. Let's say you are writing a JSON library and you replace strtod with charconv for performance reasons. Charconv returns std::errc::result_out_of_range on some conversion. Now what? You would have to parse the string again yourself to figure out which of the four possible reasons you got std::errc::result_out_of_range. Charconv already had this information but could not give it to you. By implementing the resolution to the LWG issue that matches the established strtod behavior I think we are providing the correct behavior without waiting on the committee. Andrzej did bring this up as well, and a macro BOOST_CHARCONV_STD_ERANGE was added to match what the standard says today. Please let me know if you have any additional questions. Matt
There is an open issue with LWG here:https://cplusplus.github.io/LWG/lwg-active.html#3081. The standard for <charconv> does not distinguish between underflow and overflow like strtod does. Let's say you are writing a JSON library and you replace strtod with charconv for performance reasons. Charconv returns std::errc::result_out_of_range on some conversion. Now what? You would have to parse the string again yourself to figure out which of the four possible reasons you got std::errc::result_out_of_range. Charconv already had this information but could not give it to you. By implementing the resolution to the LWG issue that matches the established strtod behavior I think we are providing the correct behavior without waiting on the committee. Andrzej did bring this up as well, and a macro BOOST_CHARCONV_STD_ERANGE was added to match what the standard says today.
After reading this and the related issue (https://github.com/cppalliance/charconv/issues/110) I actually strongly oppose the existing macro BOOST_CHARCONV_STD_ERANGE. It doesn't change the behavior of the users application but that of the compiled (Boost) library. And if given a compiled library there is a) no way to know the actual behavior of `from_chars` and b) there could be(?) ODR violations when using the library statically. As it seems to be only about whether the pass-by-reference value is modified it seems to be trivial to implement a wrapper around `boost::charconv::from_chars`, that is basically https://github.com/cppalliance/charconv/pull/111/files#diff-de4e106c65731933... I would even go as far as prodiving that function in Boost::CharConv, e.g. as `from_chars_strict` which could probably even be a template. To reiterate: Having the BOOST_CHARCONV_STD_ERANGE macro (and similar ones) makes the behavior of from_chars unpredictable which to me would be a reason to reject the library. To me it would be better to have "wrong" behavior than unknown behavior. Alex
After reading this and the related issue (https://github.com/cppalliance/charconv/issues/110) I actually strongly oppose the existing macro BOOST_CHARCONV_STD_ERANGE.
It doesn't change the behavior of the users application but that of the compiled (Boost) library. And if given a compiled library there is a) no way to know the actual behavior of `from_chars` and b) there could be(?) ODR violations when using the library statically.
As it seems to be only about whether the pass-by-reference value is modified it seems to be trivial to implement a wrapper around `boost::charconv::from_chars`, that is basically https://github.com/cppalliance/charconv/pull/111/files#diff-de4e106c65731933... I would even go as far as prodiving that function in Boost::CharConv, e.g. as `from_chars_strict` which could probably even be a template.
To reiterate: Having the BOOST_CHARCONV_STD_ERANGE macro (and similar ones) makes the behavior of from_chars unpredictable which to me would be a reason to reject the library. To me it would be better to have "wrong" behavior than unknown behavior.
Alex
Originally I was not keen on adding additional functions, but you're right this macro could make things non-deterministic. I'll revert the change, and add `from_chars_strict` instead. Matt
There is an open issue with LWG here:https://cplusplus.github.io/LWG/lwg-active.html#3081. The standard for <charconv> does not distinguish between underflow and overflow like strtod does. Let's say you are writing a JSON library and you replace strtod with charconv for performance reasons. Charconv returns std::errc::result_out_of_range on some conversion. Now what? You would have to parse the string again yourself to figure out which of the four possible reasons you got std::errc::result_out_of_range. Charconv already had this information but could not give it to you. By implementing the resolution to the LWG issue that matches the established strtod behavior I think we are providing the correct behavior without waiting on the committee. Andrzej did bring this up as well, and a macro BOOST_CHARCONV_STD_ERANGE was added to match what the standard says today.
After reading this and the related issue (https://github.com/cppalliance/charconv/issues/110) I actually strongly oppose the existing macro BOOST_CHARCONV_STD_ERANGE.
It doesn't change the behavior of the users application but that of the compiled (Boost) library. And if given a compiled library there is a) no way to know the actual behavior of `from_chars` and b) there could be(?) ODR violations when using the library statically.
Now that I understand the rationale, I also advocate for keeping the current behavior and removing BOOST_CHARCONV_STD_ERANGE. Matt, I'd also ask to include the rationale that you provided in the docs. Thanks, Ruben.
Now that I understand the rationale, I also advocate for keeping the current behavior and removing BOOST_CHARCONV_STD_ERANGE.
Matt, I'd also ask to include the rationale that you provided in the docs.
Thanks, Ruben.
Will do. See: https://github.com/cppalliance/charconv/pull/120 for replacement of the macro and doc updates. Matt
There are 2 uses of malloc. 1 is behind BOOST_CHARCONV_DEBUG used for printing 128-bit integer strings in ryu_generic_128. The other could in an extreme edge case be hit by the user. It is used in a fallback routine in case of unsuccessful 80 or 128 bit long double conversions in from_chars. If the conversion is unsuccessful on first attempt, and the string is more than 1024 bytes then we need to malloc to avoid a buffer overrun. The rationale for malloc is that it won't throw of failure.
Under which conditions would this happen? Could you please provide an example? Thanks, Ruben.
There are 2 uses of malloc. 1 is behind BOOST_CHARCONV_DEBUG used for printing 128-bit integer strings in ryu_generic_128. The other could in an extreme edge case be hit by the user. It is used in a fallback routine in case of unsuccessful 80 or 128 bit long double conversions in from_chars. If the conversion is unsuccessful on first attempt, and the string is more than 1024 bytes then we need to malloc to avoid a buffer overrun. The rationale for malloc is that it won't throw of failure.
Under which conditions would this happen? Could you please provide an example?
With arbitrary precision floats, e.g. GMP's mpf_t, it is not uncommon to dump the representation to char* using mpf_get_str. Say it had 1100 digits of what would be a 128-bit subnormal number then you would hit this edge case. Matt
Under which conditions would this happen? Could you please provide an example?
With arbitrary precision floats, e.g. GMP's mpf_t, it is not uncommon to dump the representation to char* using mpf_get_str. Say it had 1100 digits of what would be a 128-bit subnormal number then you would hit this edge case.
I see. Out of curiosity, what makes the parsing algorithm fail such that you need to call strtold?
I see. Out of curiosity, what makes the parsing algorithm fail such that you need to call strtold?
It's an intractable problem. I spent a few months trying to adapt Lemire's papers on fast_float to 128-bit precision. You need 10s of MBs of static data, and generating it at runtime is a non-starter. Ultimately Gay's algorithm works, and it's what is in strtold already. I would much rather use the battle tested implementations in the STL than my own. Matt
It's an intractable problem. I spent a few months trying to adapt Lemire's papers on fast_float to 128-bit precision. You need 10s of MBs of static data, and generating it at runtime is a non-starter. Ultimately Gay's algorithm works, and it's what is in strtold already. I would much rather use the battle tested implementations in the STL than my own. But isn't strtold locale dependent while from_chars is supposed to act as-if strtod in the "C"-locale? Wouldn't that fallback violate that contract on non-English locales?
Or does that fallback only apply when the locale difference does not matter? Alex
But isn't strtold locale dependent while from_chars is supposed to act as-if strtod in the "C"-locale? Wouldn't that fallback violate that contract on non-English locales?
Or does that fallback only apply when the locale difference does not matter?
Alex
The latter since we will have already attempted conversion internally, and rejected a string that is not in the "C"-locale. Matt
On 1/17/24 14:57, Matt Borland via Boost wrote:
But isn't strtold locale dependent while from_chars is supposed to act as-if strtod in the "C"-locale? Wouldn't that fallback violate that contract on non-English locales?
Or does that fallback only apply when the locale difference does not matter?
Alex
The latter since we will have already attempted conversion internally, and rejected a string that is not in the "C"-locale.
Is strtod operating in non-C locale required to correctly parse a number that was formatted using C locale? For example, what if it expects a fractional digits separator other than dot?
Is strtod operating in non-C locale required to correctly parse a number that was formatted using C locale? For example, what if it expects a fractional digits separator other than dot?
I see your point. I'll open an issue. I believe that strtold will do the wrong thing, and frankly I don't know what strtoflt128 out of <quadmath> does but I assume it's the same. Matt
Is strtod operating in non-C locale required to correctly parse a number that was formatted using C locale? For example, what if it expects a fractional digits separator other than dot? I see your point. I'll open an issue. I believe that strtold will do the wrong thing, and frankly I don't know what strtoflt128 out of <quadmath> does but I assume it's the same.
I was just about to write that: "1.000" is valid in the "C"-locale but e.g. in a German locale is equal to 1000, not 1. And I was actually able to come up with a test (I checked where/how to trigger `std::strto*` and picked a random example):
#include <locale> #include <iostream> #include <cassert> #include
int main(){ const char buffer[] = "1.189731495357231765e+4932"; std::locale::global(std::locale("de_DE.UTF-8")); long double v = 0; auto r = boost::charconv::from_chars( buffer, buffer+sizeof(buffer), v ); assert(r); std::locale::global(std::locale::classic()); long double v2 = 0; auto r2 = boost::charconv::from_chars( buffer, buffer+sizeof(buffer), v2 ); assert(r2);
std::cerr << v << "==" << v2 << std::endl; }
This prints "1==1.18973e+4932", i.e. stops parsing after the first char. As only decimal points but not thousand separators are accepted it might be possible to replace the dot by the current locales decimal point as a workaround. Not sure you can easily get that though. Alex
On 1/15/24 2:31 AM, Christopher Kormanyos via Boost wrote:
Dear all,
The review of Boost.Charconv by Matt Borland runs Monday, January 15th through January 25th, 2024.
Code: https://github.com/cppalliance/charconvDocs: https://master.charconv.cpp.al/Review Schedule: https://www.boost.org/community/review_schedule.html
Note that there is are files in boost detail which implement utf8 code convert facets and tests thereof. These have been in there for 20 years and have served the function needed. When originally made, it was discussed that they should be separate official libraries. The "consensus" was that they shouldn't be as they weren't officially reviewed. This has been a festering carbuncle on the face of the serialization library since all that time. Would acceptance of this library replace the functionality in the "library" detail? Robert Ramey
On 1/17/24 21:24, Robert Ramey via Boost wrote:
On 1/15/24 2:31 AM, Christopher Kormanyos via Boost wrote:
Dear all,
The review of Boost.Charconv by Matt Borland runs Monday, January 15th through January 25th, 2024.
Code: https://github.com/cppalliance/charconvDocs: https://master.charconv.cpp.al/Review Schedule: https://www.boost.org/community/review_schedule.html
Note that there is are files in boost detail which implement utf8 code convert facets and tests thereof. These have been in there for 20 years and have served the function needed. When originally made, it was discussed that they should be separate official libraries. The "consensus" was that they shouldn't be as they weren't officially reviewed. This has been a festering carbuncle on the face of the serialization library since all that time. Would acceptance of this library replace the functionality in the "library" detail?
AFAIU, Boost.CharConv implements parsing and formatting of numbers while utf8_codecvt_facet, which I think you're referring, converts strings between UTF-8 and whatever encoding wchar_t represents. Those are completely separate tasks. If anything, the already accepted Boost.Locale can be thought of the public alternative to utf8_codecvt_facet.
Christopher, Just FYI - I have started to review the documentation for this library - from an information architecture perspective - and will have feedback by the 25th. - Peter On Wed, Jan 17, 2024 at 11:14 AM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
On 1/17/24 21:24, Robert Ramey via Boost wrote:
On 1/15/24 2:31 AM, Christopher Kormanyos via Boost wrote:
Dear all,
The review of Boost.Charconv by Matt Borland runs Monday, January 15th through January 25th, 2024.
Code: https://github.com/cppalliance/charconvDocs: https://master.charconv.cpp.al/Review Schedule: https://www.boost.org/community/review_schedule.html
Note that there is are files in boost detail which implement utf8 code convert facets and tests thereof. These have been in there for 20 years and have served the function needed. When originally made, it was discussed that they should be separate official libraries. The "consensus" was that they shouldn't be as they weren't officially reviewed. This has been a festering carbuncle on the face of the serialization library since all that time. Would acceptance of this library replace the functionality in the "library" detail?
AFAIU, Boost.CharConv implements parsing and formatting of numbers while utf8_codecvt_facet, which I think you're referring, converts strings between UTF-8 and whatever encoding wchar_t represents. Those are completely separate tasks.
If anything, the already accepted Boost.Locale can be thought of the public alternative to utf8_codecvt_facet.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Hi all, This is my review of the proposed Boost.Charconv. DISCLAIMER: both me and the author are affiliated with the C++ Alliance. I am writing this review as a Boost author who would like to use this library in the code he maintains.
- What is your evaluation of the potential usefulness of the library?
I’m reviewing this library because I’d like to use it in Boost.MySQL. In short, I think any library that implements a text-based protocol can benefit from Boost.Charconv. I’m implementing functionality to safely generate client-side queries for Boost.MySQL. Given a user-supplied query format string and some runtime, untrusted values, I need to generate a string containing a SQL query. User-supplied values may be any MySQL built-in types, including integers and floats. While prepared statements already cover part of this, there are cases where they are not practical (e.g. queries with dynamic filters). This functionality is security-critical. Formatting a query incorrectly can easily lead to SQL injection vulnerabilities, which are extremely dangerous. Composing queries with ints and floats requires converting them to string in a predictable way. Until now, I’ve considered std::snprintf, boost::lexical_cast and std::to_chars for this task. Both std::snprintf and boost::lexical_cast are locale-dependent, which can lead to nasty results. When formatting the double 4.2, we need to output the string “4.2”. Under some locales, these functions output “4,2”, which is interpreted as a field list by MySQL (potential exploit!). std::to_chars behaves like I want, but isn’t implemented by most of the compilers Boost.MySQL supports (we require C++11). A similar problem occurs when parsing results. When not using prepared statements, MySQL sends values as strings, so ints and doubles must be parsed. From what I’ve observed, libraries like Boost.Json also face a similar problem, often resorting to implementing their own conversion functions. Outside the protocol world, the library can be useful for tasks like URL parameter parsing, as it’s common to embed numeric identifiers in URLs. To sum up, I think this library will be useful to other Boost libraries, and probably to others, too.
- What is your evaluation of the design?
The library is a drop-in replacement for std::to_chars/from_chars, so there is not a lot to say. I think the standard design works well, and this library follows it. The only point where Boost.Charconv deviates from the standard is when parsing floating point values that are too big or too small to be represented in their type. Charconv modifies the output to +- 0 or +-HUGE_VAL, where the standard says it shouldn’t. I understand this helps because it provides more information to the caller, and it may actually be changed in the standard. The original Boost.Charconv included a macro to configure this behavior, which I think is problematic being a compiled library. From https://github.com/cppalliance/charconv/pull/120, I can see this has been changed. An extra function, from_chars_strict, is included. I think this is the right design. The biggest barrier I have for adoption is that it’s a compiled library. MySQL is header-only, which means that the official CMake modules in the release don’t contain a Boost::mysql target. Adding a hard dependency on Boost.Charconv is thus a breaking change for my users, who will need to update their CMLs. I think that being a compiled library is the right thing, though. This is actually a limitation in our current tooling - there should be a Boost::mysql CMake target. From the conversations I’ve had with the b2 maintainer in the cpplang slack, this will be changing with the new modular Boost structure, so it shouldn’t be a concern. I will defer adoption until these changes are in place, though.
- What is your evaluation of the implementation?
I’ve only had a quick glance. It surprised me finding an instance of malloc/strtold, since these functions violate the “non-allocating, locale-independent” guarantees. I understand this is for an extreme edge case, and that the “locale-independent” guarantee is still provided through internal logic. I don’t know enough about these parsing algorithms to determine whether this is the right decision or not. How much effort would require rolling a hand-rolled strtold alternative for such edge cases? In any case, this can be done once the library is in Boost. The library seems to not be running any fuzzing at the minute, which I think is quite necessary given the library’s nature.
- What is your evaluation of the documentation?
I find it really difficult to navigate. Fortunately, it closely mimics the standard, so I end up just looking at cppreference. But still, my experience hasn’t been good. These are the main pain points I see: - It's a single page. Even for a small library, it makes navigation very cumbersome. - It’s not linked. If you want to find the reference for a struct or function, your only option is to Ctrl+F. - The overview for each function is written like a reference. But then you have another reference at the bottom. - The sidebar TOC has too many levels. As much as I want to get rid of quickbook, my feeling is that quickbook-based docs currently generate more usable output than asciidoc-based ones. It may be a matter of tuning templates and developing additional tools, but we’re not there yet. Please find some additional minor comments at the end of the review. - Did you try to use the library? With what compiler? Did you have any problems? I’ve integrated the library into Boost.MySQL, in a feature branch (https://github.com/boostorg/mysql/tree/feature/boost-charconv). I’ve fully implemented query formatting and updated the parsing functions, all using the proposed library. I’ve had no surprises. All CIs pass all tests, which perform considerable coverage on the parsing side. This includes different versions of the major compilers and operating systems, with different configurations - see https://github.com/boostorg/mysql/blob/feature/boost-charconv/.drone.star and https://github.com/boostorg/mysql/blob/feature/boost-charconv/.github/workfl... for details. I’ve found a problem when linking to the library using the CMake targets, where a missing flag causes linker errors - see https://github.com/cppalliance/charconv/issues/130.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
I’ve dedicated around 15 hours of work to integrate the library in Boost.MySQL and clean up CIs.
- Are you knowledgeable about the problem domain?
I’m just a user. I understand the domain problem and the different alternatives. I don’t have any knowledge about the algorithms’ internal workings.
- Your decision.
I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the condition that proper fuzzing is added to the test suite. I’ve worked with libfuzzer before and can assist the author if required. Thanks Matt for submitting the library, and Christopher for being review manager. Regards, Ruben. Minor comments on documentation: - The first example uses to_chars(... sizeof(buffer) - 1), which seems to imply a NULL-terminator somewhere. Since there is none, I think it should just be sizeof(buffer). - Docs don't specify the behavior about NULL-terminators. Either specify "adheres to standard" or clarify that no NULL-terminator is written. - IMO the front page should clearly state that this library is compliant with the charconv standard header, noting any exceptions. - from_chars overview is missing semicolons in the friends section. - Please use monospace in the discussion to refer to identifiers, like [first, last). - Basic usage: from_chars name is qualified but from_chars_result is not. - Code snippets wrap, which makes them difficult to read.
This is my review of the proposed Boost.Charconv Thank you Ruben for your detailed, clear,in-depth and prompt review of the proposedBoost.Charconv.
On Monday, January 22, 2024 at 09:58:14 PM GMT+1, Ruben Perez via Boost
- What is your evaluation of the potential usefulness of the library?
I’m reviewing this library because I’d like to use it in Boost.MySQL. In short, I think any library that implements a text-based protocol can benefit from Boost.Charconv. I’m implementing functionality to safely generate client-side queries for Boost.MySQL. Given a user-supplied query format string and some runtime, untrusted values, I need to generate a string containing a SQL query. User-supplied values may be any MySQL built-in types, including integers and floats. While prepared statements already cover part of this, there are cases where they are not practical (e.g. queries with dynamic filters). This functionality is security-critical. Formatting a query incorrectly can easily lead to SQL injection vulnerabilities, which are extremely dangerous. Composing queries with ints and floats requires converting them to string in a predictable way. Until now, I’ve considered std::snprintf, boost::lexical_cast and std::to_chars for this task. Both std::snprintf and boost::lexical_cast are locale-dependent, which can lead to nasty results. When formatting the double 4.2, we need to output the string “4.2”. Under some locales, these functions output “4,2”, which is interpreted as a field list by MySQL (potential exploit!). std::to_chars behaves like I want, but isn’t implemented by most of the compilers Boost.MySQL supports (we require C++11). A similar problem occurs when parsing results. When not using prepared statements, MySQL sends values as strings, so ints and doubles must be parsed. From what I’ve observed, libraries like Boost.Json also face a similar problem, often resorting to implementing their own conversion functions. Outside the protocol world, the library can be useful for tasks like URL parameter parsing, as it’s common to embed numeric identifiers in URLs. To sum up, I think this library will be useful to other Boost libraries, and probably to others, too.
- What is your evaluation of the design?
The library is a drop-in replacement for std::to_chars/from_chars, so there is not a lot to say. I think the standard design works well, and this library follows it. The only point where Boost.Charconv deviates from the standard is when parsing floating point values that are too big or too small to be represented in their type. Charconv modifies the output to +- 0 or +-HUGE_VAL, where the standard says it shouldn’t. I understand this helps because it provides more information to the caller, and it may actually be changed in the standard. The original Boost.Charconv included a macro to configure this behavior, which I think is problematic being a compiled library. From https://github.com/cppalliance/charconv/pull/120, I can see this has been changed. An extra function, from_chars_strict, is included. I think this is the right design. The biggest barrier I have for adoption is that it’s a compiled library. MySQL is header-only, which means that the official CMake modules in the release don’t contain a Boost::mysql target. Adding a hard dependency on Boost.Charconv is thus a breaking change for my users, who will need to update their CMLs. I think that being a compiled library is the right thing, though. This is actually a limitation in our current tooling - there should be a Boost::mysql CMake target. From the conversations I’ve had with the b2 maintainer in the cpplang slack, this will be changing with the new modular Boost structure, so it shouldn’t be a concern. I will defer adoption until these changes are in place, though.
- What is your evaluation of the implementation?
I’ve only had a quick glance. It surprised me finding an instance of malloc/strtold, since these functions violate the “non-allocating, locale-independent” guarantees. I understand this is for an extreme edge case, and that the “locale-independent” guarantee is still provided through internal logic. I don’t know enough about these parsing algorithms to determine whether this is the right decision or not. How much effort would require rolling a hand-rolled strtold alternative for such edge cases? In any case, this can be done once the library is in Boost. The library seems to not be running any fuzzing at the minute, which I think is quite necessary given the library’s nature.
- What is your evaluation of the documentation?
I find it really difficult to navigate. Fortunately, it closely mimics the standard, so I end up just looking at cppreference. But still, my experience hasn’t been good. These are the main pain points I see: - It's a single page. Even for a small library, it makes navigation very cumbersome. - It’s not linked. If you want to find the reference for a struct or function, your only option is to Ctrl+F. - The overview for each function is written like a reference. But then you have another reference at the bottom. - The sidebar TOC has too many levels. As much as I want to get rid of quickbook, my feeling is that quickbook-based docs currently generate more usable output than asciidoc-based ones. It may be a matter of tuning templates and developing additional tools, but we’re not there yet. Please find some additional minor comments at the end of the review. - Did you try to use the library? With what compiler? Did you have any problems? I’ve integrated the library into Boost.MySQL, in a feature branch (https://github.com/boostorg/mysql/tree/feature/boost-charconv). I’ve fully implemented query formatting and updated the parsing functions, all using the proposed library. I’ve had no surprises. All CIs pass all tests, which perform considerable coverage on the parsing side. This includes different versions of the major compilers and operating systems, with different configurations - see https://github.com/boostorg/mysql/blob/feature/boost-charconv/.drone.star and https://github.com/boostorg/mysql/blob/feature/boost-charconv/.github/workfl... for details. I’ve found a problem when linking to the library using the CMake targets, where a missing flag causes linker errors - see https://github.com/cppalliance/charconv/issues/130.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
I’ve dedicated around 15 hours of work to integrate the library in Boost.MySQL and clean up CIs.
- Are you knowledgeable about the problem domain?
I’m just a user. I understand the domain problem and the different alternatives. I don’t have any knowledge about the algorithms’ internal workings.
- Your decision.
I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the condition that proper fuzzing is added to the test suite. I’ve worked with libfuzzer before and can assist the author if required. Thanks Matt for submitting the library, and Christopher for being review manager. Regards, Ruben. Minor comments on documentation: - The first example uses to_chars(... sizeof(buffer) - 1), which seems to imply a NULL-terminator somewhere. Since there is none, I think it should just be sizeof(buffer). - Docs don't specify the behavior about NULL-terminators. Either specify "adheres to standard" or clarify that no NULL-terminator is written. - IMO the front page should clearly state that this library is compliant with the charconv standard header, noting any exceptions. - from_chars overview is missing semicolons in the friends section. - Please use monospace in the discussion to refer to identifiers, like [first, last). - Basic usage: from_chars name is qualified but from_chars_result is not. - Code snippets wrap, which makes them difficult to read. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Monday, January 22nd, 2024 at 9:57 PM, Ruben Perez via Boost
Hi all,
This is my review of the proposed Boost.Charconv.
Reuben, Thanks for taking time to review.
I’ve only had a quick glance. It surprised me finding an instance of malloc/strtold, since these functions violate the “non-allocating, locale-independent” guarantees. I understand this is for an extreme edge case, and that the “locale-independent” guarantee is still provided through internal logic. I don’t know enough about these parsing algorithms to determine whether this is the right decision or not. How much effort would require rolling a hand-rolled strtold alternative for such edge cases? In any case, this can be done once the library is in Boost.
The locale issue from last week is fixed on develop and master. strtold and strtof128 from scratch will be non-trivial additions to the library. The malloc could easily be dropped by rejecting strings in fallback over 1024 bytes, but that seems like the worse option of the two for now.
As much as I want to get rid of quickbook, my feeling is that quickbook-based docs currently generate more usable output than asciidoc-based ones. It may be a matter of tuning templates and developing additional tools, but we’re not there yet.
I think Klemens has multi-page asciidoc libraries. I'll look at how he does that.
I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the condition that proper fuzzing is added to the test suite. I’ve worked with libfuzzer before and can assist the author if required.
I will add the fuzzing, and I am sure I will ask questions as I have never used it before.
Minor comments on documentation:
- The first example uses to_chars(... sizeof(buffer) - 1), which seems to imply a NULL-terminator somewhere. Since there is none, I think it should just be sizeof(buffer). - Docs don't specify the behavior about NULL-terminators. Either specify "adheres to standard" or clarify that no NULL-terminator is written. - IMO the front page should clearly state that this library is compliant with the charconv standard header, noting any exceptions. - from_chars overview is missing semicolons in the friends section. - Please use monospace in the discussion to refer to identifiers, like [first, last). - Basic usage: from_chars name is qualified but from_chars_result is not. - Code snippets wrap, which makes them difficult to read.
See: https://github.com/cppalliance/charconv/issues/131 Matt
The review of Boost.Charconv by Matt Borland runs> Monday, January 15th through January 25th, 2024.
Dear all,
This post is intended to kindly encourage thesubmission of more reviews for the proposedBoost.Charconv.
We've had thoughtful, deep discussion,
but we really need a few more reviews to form
a solid basis for results. The review periodis formally scheduled until 25-January-2024.
Your review can be brief, and handle someof the major queries. A terse review is probablybetter than an absent one.
Thank you for considering the proposedBoost.Charconv.
Christopher
On Tuesday, January 23, 2024 at 08:43:36 AM GMT+1, Matt Borland via Boost
Hi all,
This is my review of the proposed Boost.Charconv.
Reuben, Thanks for taking time to review.
I’ve only had a quick glance. It surprised me finding an instance of malloc/strtold, since these functions violate the “non-allocating, locale-independent” guarantees. I understand this is for an extreme edge case, and that the “locale-independent” guarantee is still provided through internal logic. I don’t know enough about these parsing algorithms to determine whether this is the right decision or not. How much effort would require rolling a hand-rolled strtold alternative for such edge cases? In any case, this can be done once the library is in Boost.
The locale issue from last week is fixed on develop and master. strtold and strtof128 from scratch will be non-trivial additions to the library. The malloc could easily be dropped by rejecting strings in fallback over 1024 bytes, but that seems like the worse option of the two for now.
As much as I want to get rid of quickbook, my feeling is that quickbook-based docs currently generate more usable output than asciidoc-based ones. It may be a matter of tuning templates and developing additional tools, but we’re not there yet.
I think Klemens has multi-page asciidoc libraries. I'll look at how he does that.
I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the condition that proper fuzzing is added to the test suite. I’ve worked with libfuzzer before and can assist the author if required.
I will add the fuzzing, and I am sure I will ask questions as I have never used it before.
Minor comments on documentation:
- The first example uses to_chars(... sizeof(buffer) - 1), which seems to imply a NULL-terminator somewhere. Since there is none, I think it should just be sizeof(buffer). - Docs don't specify the behavior about NULL-terminators. Either specify "adheres to standard" or clarify that no NULL-terminator is written. - IMO the front page should clearly state that this library is compliant with the charconv standard header, noting any exceptions. - from_chars overview is missing semicolons in the friends section. - Please use monospace in the discussion to refer to identifiers, like [first, last). - Basic usage: from_chars name is qualified but from_chars_result is not. - Code snippets wrap, which makes them difficult to read.
See: https://github.com/cppalliance/charconv/issues/131 Matt _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
wt., 23 sty 2024 o 14:38 Christopher Kormanyos via Boost < boost@lists.boost.org> napisał(a):
The review of Boost.Charconv by Matt Borland runs> Monday, January 15th through January 25th, 2024.
Dear all, This post is intended to kindly encourage thesubmission of more reviews for the proposedBoost.Charconv. We've had thoughtful, deep discussion, but we really need a few more reviews to form a solid basis for results. The review periodis formally scheduled until 25-January-2024.
Your review can be brief, and handle someof the major queries. A terse review is probablybetter than an absent one. Thank you for considering the proposedBoost.Charconv.
Christopher On Tuesday, January 23, 2024 at 08:43:36 AM GMT+1, Matt Borland via Boost
wrote: On Monday, January 22nd, 2024 at 9:57 PM, Ruben Perez via Boost < boost@lists.boost.org> wrote:
Hi all,
This is my review of the proposed Boost.Charconv.
Reuben,
Thanks for taking time to review.
I’ve only had a quick glance. It surprised me finding an instance of malloc/strtold, since these functions violate the “non-allocating, locale-independent” guarantees. I understand this is for an extreme edge case, and that the “locale-independent” guarantee is still provided through internal logic. I don’t know enough about these parsing algorithms to determine whether this is the right decision or not. How much effort would require rolling a hand-rolled strtold alternative for such edge cases? In any case, this can be done once the library is in Boost.
The locale issue from last week is fixed on develop and master. strtold and strtof128 from scratch will be non-trivial additions to the library. The malloc could easily be dropped by rejecting strings in fallback over 1024 bytes, but that seems like the worse option of the two for now.
As much as I want to get rid of quickbook, my feeling is that quickbook-based docs currently generate more usable output than asciidoc-based ones. It may be a matter of tuning templates and developing additional tools, but we’re not there yet.
I think Klemens has multi-page asciidoc libraries. I'll look at how he does that.
I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the condition that proper fuzzing is added to the test suite. I’ve worked with libfuzzer before and can assist the author if required.
I will add the fuzzing, and I am sure I will ask questions as I have never used it before.
Minor comments on documentation:
- The first example uses to_chars(... sizeof(buffer) - 1), which seems to imply a NULL-terminator somewhere. Since there is none, I think it should just be sizeof(buffer). - Docs don't specify the behavior about NULL-terminators. Either specify "adheres to standard" or clarify that no NULL-terminator is written. - IMO the front page should clearly state that this library is compliant with the charconv standard header, noting any exceptions. - from_chars overview is missing semicolons in the friends section. - Please use monospace in the discussion to refer to identifiers, like [first, last). - Basic usage: from_chars name is qualified but from_chars_result is not. - Code snippets wrap, which makes them difficult to read.
Hi Everyone, This is not really a review, but I want to report an issue I have with this library on an ideological level. I mean the behavior with assigning zero and `HUGE_VAL` upon result_out_of_range. https://github.com/cppalliance/charconv/issues/110 When you do it, you cannot claim that this library is "an Implementation of <charconv> in C++11". Peter informs us about this issue: https://cplusplus.github.io/LWG/lwg-active.html#3081 But having an issue in the Standard doesn't automatically mean that the standard will change in the way recommended in that report. This has practical implications on the users: I cannot just substitute std::from_chars for boost::from_chars (in order to increase performance), because the semantics of my program will slightly change. boost::from_chars_strict may be a drop-in replacement for std::from_chars. boost::from_chars is not. We recently had a discussion that boost::optional is not a substitute for std::optional, because they have different interfaces; neither is boost::variant2 for std::variant: you cannot interchange them without consequences. It looks like boost::from_chars is heading for the same category. Can I request that this library flips the defaults, so that boost::from_chars is 100%-compatible with std::from_chars, and you also get boost::from_chars_plus that behaves like strtod? I hope that this message does not come across as negative. Matt, I can only imagine how much work went into writing this library. Thank you for doing this and sharing it. The speedups indicated in the benchmark section make the library worth using. I never believed anything can outperform Boost.Spirit. Regards, &rzej;
Can I request that this library flips the defaults, so that boost::from_chars is 100%-compatible with std::from_chars, and you also get boost::from_chars_plus that behaves like strtod?
Your reasoning is sound, and I have no issue with flipping them. In the docs I can highlight that the default behavior will likely change in the future based on whatever resolution LWG comes up with. Matt
On 1/23/24 20:14, Matt Borland via Boost wrote:
Can I request that this library flips the defaults, so that boost::from_chars is 100%-compatible with std::from_chars, and you also get boost::from_chars_plus that behaves like strtod?
Your reasoning is sound, and I have no issue with flipping them. In the docs I can highlight that the default behavior will likely change in the future based on whatever resolution LWG comes up with.
I think I'd prefer that the "default" interface (that is, the one that users will likely use by default) to be the one that implements the "right" behavior (i.e. the behavior people find most useful) and keeps implementing that behavior regardless of the standard committee decisions on the standard. We recently had Boost.Scope review, and in its results it was highlighted that people generally prefer better interfaces to strict conformance with the standard. Boost libraries don't have to be strict implementation of the standard, especially when we can do better.
Can I request that this library flips the defaults>> so that boost::from_chars is 100%-compatible>> with std::from_chars, and you also get boost::from_chars_plus that behaves like strtod? Your reasoning is sound, and I have no issue> with flipping them Reviewers: If this default-flipping-befavior is ofimportance, please do note this fascinationdiscussion. This is because it is seems likea somewhat significant recognition at aprogressive time point in the review. Christopher On Tuesday, January 23, 2024 at 06:25:13 PM GMT+1, Andrey Semashev via Boost
wrote:
On 1/23/24 20:14, Matt Borland via Boost wrote:
Can I request that this library flips the defaults, so that boost::from_chars is 100%-compatible with std::from_chars, and you also get boost::from_chars_plus that behaves like strtod?
Your reasoning is sound, and I have no issue with flipping them. In the docs I can highlight that the default behavior will likely change in the future based on whatever resolution LWG comes up with.
I think I'd prefer that the "default" interface (that is, the one that users will likely use by default) to be the one that implements the "right" behavior (i.e. the behavior people find most useful) and keeps implementing that behavior regardless of the standard committee decisions on the standard. We recently had Boost.Scope review, and in its results it was highlighted that people generally prefer better interfaces to strict conformance with the standard. Boost libraries don't have to be strict implementation of the standard, especially when we can do better. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Can I request that this library flips the defaults>>> so that boost::from_chars is 100%-compatible>>> with std::from_chars, and you also get boost::from_chars_plus that behaves like strtod? Your reasoning is sound, and I have no issue>> with flipping them Reviewers: If this default-flipping-befavior is of> importance, please do note this fascination> discussion. This is because it is seems like> a somewhat significant recognition at a> progressive time point in the review. Sorry, I had a typographical error in thatexpression. Reviewers: If this default-flipping-befavior is ofmajor importance, please do note this fascinatingdiscussion. This is because it is seems likea somewhat significant recognition at aprogressive time point in the review. Thank you. Christopher On Tuesday, January 23, 2024 at 06:42:38 PM GMT+1, Christopher Kormanyos
wrote:
Can I request that this library flips the defaults>> so that boost::from_chars is 100%-compatible>> with std::from_chars, and you also get boost::from_chars_plus that behaves like strtod? Your reasoning is sound, and I have no issue> with flipping them Reviewers: If this default-flipping-befavior is ofimportance, please do note this fascinationdiscussion. This is because it is seems likea somewhat significant recognition at aprogressive time point in the review. Christopher On Tuesday, January 23, 2024 at 06:25:13 PM GMT+1, Andrey Semashev via Boost
wrote:
On 1/23/24 20:14, Matt Borland via Boost wrote:
Can I request that this library flips the defaults, so that boost::from_chars is 100%-compatible with std::from_chars, and you also get boost::from_chars_plus that behaves like strtod?
Your reasoning is sound, and I have no issue with flipping them. In the docs I can highlight that the default behavior will likely change in the future based on whatever resolution LWG comes up with.
I think I'd prefer that the "default" interface (that is, the one that users will likely use by default) to be the one that implements the "right" behavior (i.e. the behavior people find most useful) and keeps implementing that behavior regardless of the standard committee decisions on the standard. We recently had Boost.Scope review, and in its results it was highlighted that people generally prefer better interfaces to strict conformance with the standard. Boost libraries don't have to be strict implementation of the standard, especially when we can do better. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 1/23/24 20:25, Andrey Semashev wrote:
On 1/23/24 20:14, Matt Borland via Boost wrote:
Can I request that this library flips the defaults, so that boost::from_chars is 100%-compatible with std::from_chars, and you also get boost::from_chars_plus that behaves like strtod?
Your reasoning is sound, and I have no issue with flipping them. In the docs I can highlight that the default behavior will likely change in the future based on whatever resolution LWG comes up with.
I think I'd prefer that the "default" interface (that is, the one that users will likely use by default) to be the one that implements the "right" behavior (i.e. the behavior people find most useful) and keeps implementing that behavior regardless of the standard committee decisions on the standard.
We recently had Boost.Scope review, and in its results it was highlighted that people generally prefer better interfaces to strict conformance with the standard. Boost libraries don't have to be strict implementation of the standard, especially when we can do better.
In fact, I will add that the behavior of the standard-complying from_chars_strict should remain as it currently is, even after the resolution of the LWG issue. This is so that users' code that relies on this behavior would remain working even after the issue is resolved. Similarly, the strtod-like from_chars should remain strtod-like regardless if the LWG resolution so that users' core doesn't break. Making one behavior the "default" only to then change it is the worst you could do.
Can I request that this library flips the defaults, so that boost::from_chars is 100%-compatible with std::from_chars, and you also get boost::from_chars_plus that behaves like strtod?
Your reasoning is sound, and I have no issue with flipping them. In the docs I can highlight that the default behavior will likely change in the future based on whatever resolution LWG comes up with.
I'm good with the flipping, too.
On 23/01/2024 13:37, Christopher Kormanyos via Boost wrote:
The review of Boost.Charconv by Matt Borland runs> Monday, January 15th through January 25th, 2024.
Dear all, This post is intended to kindly encourage thesubmission of more reviews for the proposedBoost.Charconv. We've had thoughtful, deep discussion, but we really need a few more reviews to form a solid basis for results. The review periodis formally scheduled until 25-January-2024.
I'm hoping to write a review, but I'm only just getting started with the library, I'll try and get there as soon as I can... Best, John.
I'm hoping to write a review, but I'm> only just getting started with the library, I'll try and get there as soon as I can...
Best, John. Thanks John, I've just turned up the encouragementyet again a notch higher, indicating that terse maybe better than absent ones. Yet we do not want to force unprepared results. If you do get a chance, we'd really like toget your review. Christopher On Wednesday, January 24, 2024 at 06:57:09 PM GMT+1, John Maddock via Boost
wrote:
On 23/01/2024 13:37, Christopher Kormanyos via Boost wrote:
> The review of Boost.Charconv by Matt Borland runs> Monday, January 15th through January 25th, 2024.
Dear all, This post is intended to kindly encourage thesubmission of more reviews for the proposedBoost.Charconv. We've had thoughtful, deep discussion, but we really need a few more reviews to form a solid basis for results. The review periodis formally scheduled until 25-January-2024.
I'm hoping to write a review, but I'm only just getting started with the library, I'll try and get there as soon as I can... Best, John. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
The review of Boost.Charconv by Matt Borland runs>> Monday, January 15th through January 25th, 2024.
Dear all, This post is intended to kindly encourage the> submission of more reviews for the proposed> Boost.Charconv. Again, I kindly encourage you to review andsubmit review for the exciting proposedBoost.Charconv, which aids in domain-specificsolutions and portability.
We have now 1 good library reviewand 1 document review.
I think it would be great to have 3 reviewsby Friday, tomorrow, 25-January-2024in order to form a solid review basis.
Please if interest and timeallow, review and submit review.
Kind regards, Christopher
On Tuesday, January 23, 2024 at 02:37:42 PM GMT+1, Christopher Kormanyos
The review of Boost.Charconv by Matt Borland runs> Monday, January 15th through January 25th, 2024.
Dear all,
This post is intended to kindly encourage thesubmission of more reviews for the proposedBoost.Charconv.
We've had thoughtful, deep discussion,
but we really need a few more reviews to form
a solid basis for results. The review periodis formally scheduled until 25-January-2024.
Your review can be brief, and handle someof the major queries. A terse review is probablybetter than an absent one.
Thank you for considering the proposedBoost.Charconv.
Christopher
On Tuesday, January 23, 2024 at 08:43:36 AM GMT+1, Matt Borland via Boost
Hi all,
This is my review of the proposed Boost.Charconv.
Reuben, Thanks for taking time to review.
I’ve only had a quick glance. It surprised me finding an instance of malloc/strtold, since these functions violate the “non-allocating, locale-independent” guarantees. I understand this is for an extreme edge case, and that the “locale-independent” guarantee is still provided through internal logic. I don’t know enough about these parsing algorithms to determine whether this is the right decision or not. How much effort would require rolling a hand-rolled strtold alternative for such edge cases? In any case, this can be done once the library is in Boost.
The locale issue from last week is fixed on develop and master. strtold and strtof128 from scratch will be non-trivial additions to the library. The malloc could easily be dropped by rejecting strings in fallback over 1024 bytes, but that seems like the worse option of the two for now.
As much as I want to get rid of quickbook, my feeling is that quickbook-based docs currently generate more usable output than asciidoc-based ones. It may be a matter of tuning templates and developing additional tools, but we’re not there yet.
I think Klemens has multi-page asciidoc libraries. I'll look at how he does that.
I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the condition that proper fuzzing is added to the test suite. I’ve worked with libfuzzer before and can assist the author if required.
I will add the fuzzing, and I am sure I will ask questions as I have never used it before.
Minor comments on documentation:
- The first example uses to_chars(... sizeof(buffer) - 1), which seems to imply a NULL-terminator somewhere. Since there is none, I think it should just be sizeof(buffer). - Docs don't specify the behavior about NULL-terminators. Either specify "adheres to standard" or clarify that no NULL-terminator is written. - IMO the front page should clearly state that this library is compliant with the charconv standard header, noting any exceptions. - from_chars overview is missing semicolons in the friends section. - Please use monospace in the discussion to refer to identifiers, like [first, last). - Basic usage: from_chars name is qualified but from_chars_result is not. - Code snippets wrap, which makes them difficult to read.
See: https://github.com/cppalliance/charconv/issues/131 Matt _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 1/24/24 20:57, Christopher Kormanyos via Boost wrote:
The review of Boost.Charconv by Matt Borland runs>> Monday, January 15th through January 25th, 2024.
Dear all, This post is intended to kindly encourage the> submission of more reviews for the proposed> Boost.Charconv. Again, I kindly encourage you to review andsubmit review for the exciting proposedBoost.Charconv, which aids in domain-specificsolutions and portability.
We have now 1 good library reviewand 1 document review. I think it would be great to have 3 reviewsby Friday, tomorrow, 25-January-2024in order to form a solid review basis.
Please if interest and timeallow, review and submit review. Kind regards, Christopher
Christopher, please avoid top-posting. This is against posting rules to this ML. https://www.boost.org/community/policy.html#quoting Also, I'm not sure if this is genuine typos, an issue with your email client, or some broken HTML to plain text translation, but your posts often miss spaces between words, which make them difficult to read. In particular, in the initial post about starting the review this broke links to library code and documentation and others. Regardless of the source of the issue, please post only as plain text to this list. Don't post as HTML.
On Tuesday, January 23, 2024 at 02:37:42 PM GMT+1, Christopher Kormanyos
wrote: The review of Boost.Charconv by Matt Borland runs> Monday, January 15th through January 25th, 2024.
Dear all, This post is intended to kindly encourage thesubmission of more reviews for the proposedBoost.Charconv. We've had thoughtful, deep discussion, but we really need a few more reviews to form a solid basis for results. The review periodis formally scheduled until 25-January-2024.
Your review can be brief, and handle someof the major queries. A terse review is probablybetter than an absent one. Thank you for considering the proposedBoost.Charconv.
Christopher On Tuesday, January 23, 2024 at 08:43:36 AM GMT+1, Matt Borland via Boost
wrote: On Monday, January 22nd, 2024 at 9:57 PM, Ruben Perez via Boost
wrote: Hi all,
This is my review of the proposed Boost.Charconv.
Reuben,
Thanks for taking time to review.
I’ve only had a quick glance. It surprised me finding an instance of malloc/strtold, since these functions violate the “non-allocating, locale-independent” guarantees. I understand this is for an extreme edge case, and that the “locale-independent” guarantee is still provided through internal logic. I don’t know enough about these parsing algorithms to determine whether this is the right decision or not. How much effort would require rolling a hand-rolled strtold alternative for such edge cases? In any case, this can be done once the library is in Boost.
The locale issue from last week is fixed on develop and master. strtold and strtof128 from scratch will be non-trivial additions to the library. The malloc could easily be dropped by rejecting strings in fallback over 1024 bytes, but that seems like the worse option of the two for now.
As much as I want to get rid of quickbook, my feeling is that quickbook-based docs currently generate more usable output than asciidoc-based ones. It may be a matter of tuning templates and developing additional tools, but we’re not there yet.
I think Klemens has multi-page asciidoc libraries. I'll look at how he does that.
I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the condition that proper fuzzing is added to the test suite. I’ve worked with libfuzzer before and can assist the author if required.
I will add the fuzzing, and I am sure I will ask questions as I have never used it before.
Minor comments on documentation:
- The first example uses to_chars(... sizeof(buffer) - 1), which seems to imply a NULL-terminator somewhere. Since there is none, I think it should just be sizeof(buffer). - Docs don't specify the behavior about NULL-terminators. Either specify "adheres to standard" or clarify that no NULL-terminator is written. - IMO the front page should clearly state that this library is compliant with the charconv standard header, noting any exceptions. - from_chars overview is missing semicolons in the friends section. - Please use monospace in the discussion to refer to identifiers, like [first, last). - Basic usage: from_chars name is qualified but from_chars_result is not. - Code snippets wrap, which makes them difficult to read.
See: https://github.com/cppalliance/charconv/issues/131
Matt
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Wednesday, January 24, 2024 at 07:35:36 PM GMT+1, Andrey Semashev
>> The review of Boost.Charconv by Matt Borland runs>> Monday, January 15th through January 25th, 2024.
Dear all, This post is intended to kindly encourage the> submission of more reviews for the proposed> Boost.Charconv. Again, I kindly encourage you to review andsubmit review for the exciting proposedBoost.Charconv, which aids in domain-specificsolutions and portability.
We have now 1 good library reviewand 1 document review. I think it would be great to have 3 reviewsby Friday, tomorrow, 25-January-2024in order to form a solid review basis.
Please if interest and timeallow, review and submit review. Kind regards, Christopher
Christopher, please avoid top-posting. This is against posting rules to this ML. https://www.boost.org/community/policy.html#quoting Also, I'm not sure if this is genuine typos, an issue with your email client, or some broken HTML to plain text translation, but your posts often miss spaces between words, which make them difficult to read. In particular, in the initial post about starting the review this broke links to library code and documentation and others. Regardless of the source of the issue, please post only as plain text to this list. Don't post as HTML.
On Tuesday, January 23, 2024 at 02:37:42 PM GMT+1, Christopher Kormanyos
wrote: > The review of Boost.Charconv by Matt Borland runs> Monday, January 15th through January 25th, 2024. Dear all, This post is intended to kindly encourage thesubmission of more reviews for the proposedBoost.Charconv. We've had thoughtful, deep discussion, but we really need a few more reviews to form a solid basis for results. The review periodis formally scheduled until 25-January-2024.
Your review can be brief, and handle someof the major queries. A terse review is probablybetter than an absent one. Thank you for considering the proposedBoost.Charconv.
Christopher On Tuesday, January 23, 2024 at 08:43:36 AM GMT+1, Matt Borland via Boost
wrote: On Monday, January 22nd, 2024 at 9:57 PM, Ruben Perez via Boost
wrote: Hi all,
This is my review of the proposed Boost.Charconv.
Reuben,
Thanks for taking time to review.
I’ve only had a quick glance. It surprised me finding an instance of malloc/strtold, since these functions violate the “non-allocating, locale-independent” guarantees. I understand this is for an extreme edge case, and that the “locale-independent” guarantee is still provided through internal logic. I don’t know enough about these parsing algorithms to determine whether this is the right decision or not. How much effort would require rolling a hand-rolled strtold alternative for such edge cases? In any case, this can be done once the library is in Boost.
The locale issue from last week is fixed on develop and master. strtold and strtof128 from scratch will be non-trivial additions to the library. The malloc could easily be dropped by rejecting strings in fallback over 1024 bytes, but that seems like the worse option of the two for now.
As much as I want to get rid of quickbook, my feeling is that quickbook-based docs currently generate more usable output than asciidoc-based ones. It may be a matter of tuning templates and developing additional tools, but we’re not there yet.
I think Klemens has multi-page asciidoc libraries. I'll look at how he does that.
I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the condition that proper fuzzing is added to the test suite. I’ve worked with libfuzzer before and can assist the author if required.
I will add the fuzzing, and I am sure I will ask questions as I have never used it before.
Minor comments on documentation:
- The first example uses to_chars(... sizeof(buffer) - 1), which seems to imply a NULL-terminator somewhere. Since there is none, I think it should just be sizeof(buffer). - Docs don't specify the behavior about NULL-terminators. Either specify "adheres to standard" or clarify that no NULL-terminator is written. - IMO the front page should clearly state that this library is compliant with the charconv standard header, noting any exceptions. - from_chars overview is missing semicolons in the friends section. - Please use monospace in the discussion to refer to identifiers, like [first, last). - Basic usage: from_chars name is qualified but from_chars_result is not. - Code snippets wrap, which makes them difficult to read.
See: https://github.com/cppalliance/charconv/issues/131
Matt
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: Boost Info Page
| | | | Boost Info Page | | |
Please if interest and timeallow, review and submit review. Kind regards, Christopher
Christopher, please avoid top-posting. This is against posting> rules to this ML Thank you for pointing this out Andrey. Sorry about that. I hope this is better? Christopher
On 1/24/24 22:00, Christopher Kormanyos wrote:
On Wednesday, January 24, 2024 at 07:35:36 PM GMT+1, Andrey Semashev
wrote: Christopher, please avoid top-posting. This is against posting rules to this ML
Thank you for pointing this out Andrey. Sorry about that. I hope this is better?
The page I linked also talks about avoiding over-quoting. Also, your post came as HTML again to me.
pon., 22 sty 2024 o 21:58 Ruben Perez via Boost
Hi all,
This is my review of the proposed Boost.Charconv.
DISCLAIMER: both me and the author are affiliated with the C++ Alliance. I am writing this review as a Boost author who would like to use this library in the code he maintains.
- What is your evaluation of the potential usefulness of the library?
I’m reviewing this library because I’d like to use it in Boost.MySQL. In short, I think any library that implements a text-based protocol can benefit from Boost.Charconv.
I’m implementing functionality to safely generate client-side queries for Boost.MySQL. Given a user-supplied query format string and some runtime, untrusted values, I need to generate a string containing a SQL query. User-supplied values may be any MySQL built-in types, including integers and floats. While prepared statements already cover part of this, there are cases where they are not practical (e.g. queries with dynamic filters).
This functionality is security-critical. Formatting a query incorrectly can easily lead to SQL injection vulnerabilities, which are extremely dangerous.
Composing queries with ints and floats requires converting them to string in a predictable way. Until now, I’ve considered std::snprintf, boost::lexical_cast and std::to_chars for this task. Both std::snprintf and boost::lexical_cast are locale-dependent, which can lead to nasty results. When formatting the double 4.2, we need to output the string “4.2”. Under some locales, these functions output “4,2”, which is interpreted as a field list by MySQL (potential exploit!). std::to_chars behaves like I want, but isn’t implemented by most of the compilers Boost.MySQL supports (we require C++11).
A similar problem occurs when parsing results. When not using prepared statements, MySQL sends values as strings, so ints and doubles must be parsed. From what I’ve observed, libraries like Boost.Json also face a similar problem, often resorting to implementing their own conversion functions.
Outside the protocol world, the library can be useful for tasks like URL parameter parsing, as it’s common to embed numeric identifiers in URLs.
To sum up, I think this library will be useful to other Boost libraries, and probably to others, too.
- What is your evaluation of the design?
The library is a drop-in replacement for std::to_chars/from_chars, so there is not a lot to say. I think the standard design works well, and this library follows it.
The only point where Boost.Charconv deviates from the standard is when parsing floating point values that are too big or too small to be represented in their type. Charconv modifies the output to +- 0 or +-HUGE_VAL, where the standard says it shouldn’t. I understand this helps because it provides more information to the caller, and it may actually be changed in the standard. The original Boost.Charconv included a macro to configure this behavior, which I think is problematic being a compiled library. From https://github.com/cppalliance/charconv/pull/120, I can see this has been changed. An extra function, from_chars_strict, is included. I think this is the right design.
The biggest barrier I have for adoption is that it’s a compiled library. MySQL is header-only, which means that the official CMake modules in the release don’t contain a Boost::mysql target. Adding a hard dependency on Boost.Charconv is thus a breaking change for my users, who will need to update their CMLs.
Would it be too hard to provide a header-only way of consuming this library? Libraries like Boost.Test allow that, but I do not know how that squares with ODR. Regards, &rzej;
I think that being a compiled library is the right thing, though. This is actually a limitation in our current tooling - there should be a Boost::mysql CMake target. From the conversations I’ve had with the b2 maintainer in the cpplang slack, this will be changing with the new modular Boost structure, so it shouldn’t be a concern. I will defer adoption until these changes are in place, though.
- What is your evaluation of the implementation?
I’ve only had a quick glance. It surprised me finding an instance of malloc/strtold, since these functions violate the “non-allocating, locale-independent” guarantees. I understand this is for an extreme edge case, and that the “locale-independent” guarantee is still provided through internal logic. I don’t know enough about these parsing algorithms to determine whether this is the right decision or not. How much effort would require rolling a hand-rolled strtold alternative for such edge cases? In any case, this can be done once the library is in Boost.
The library seems to not be running any fuzzing at the minute, which I think is quite necessary given the library’s nature.
- What is your evaluation of the documentation?
I find it really difficult to navigate. Fortunately, it closely mimics the standard, so I end up just looking at cppreference. But still, my experience hasn’t been good. These are the main pain points I see:
- It's a single page. Even for a small library, it makes navigation very cumbersome. - It’s not linked. If you want to find the reference for a struct or function, your only option is to Ctrl+F. - The overview for each function is written like a reference. But then you have another reference at the bottom. - The sidebar TOC has too many levels.
As much as I want to get rid of quickbook, my feeling is that quickbook-based docs currently generate more usable output than asciidoc-based ones. It may be a matter of tuning templates and developing additional tools, but we’re not there yet.
Please find some additional minor comments at the end of the review.
- Did you try to use the library? With what compiler? Did you have any problems?
I’ve integrated the library into Boost.MySQL, in a feature branch (https://github.com/boostorg/mysql/tree/feature/boost-charconv). I’ve fully implemented query formatting and updated the parsing functions, all using the proposed library. I’ve had no surprises. All CIs pass all tests, which perform considerable coverage on the parsing side. This includes different versions of the major compilers and operating systems, with different configurations - see https://github.com/boostorg/mysql/blob/feature/boost-charconv/.drone.star and https://github.com/boostorg/mysql/blob/feature/boost-charconv/.github/workfl... for details.
I’ve found a problem when linking to the library using the CMake targets, where a missing flag causes linker errors - see https://github.com/cppalliance/charconv/issues/130.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
I’ve dedicated around 15 hours of work to integrate the library in Boost.MySQL and clean up CIs.
- Are you knowledgeable about the problem domain?
I’m just a user. I understand the domain problem and the different alternatives. I don’t have any knowledge about the algorithms’ internal workings.
- Your decision.
I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the condition that proper fuzzing is added to the test suite. I’ve worked with libfuzzer before and can assist the author if required.
Thanks Matt for submitting the library, and Christopher for being review manager.
Regards, Ruben.
Minor comments on documentation:
- The first example uses to_chars(... sizeof(buffer) - 1), which seems to imply a NULL-terminator somewhere. Since there is none, I think it should just be sizeof(buffer). - Docs don't specify the behavior about NULL-terminators. Either specify "adheres to standard" or clarify that no NULL-terminator is written. - IMO the front page should clearly state that this library is compliant with the charconv standard header, noting any exceptions. - from_chars overview is missing semicolons in the friends section. - Please use monospace in the discussion to refer to identifiers, like [first, last). - Basic usage: from_chars name is qualified but from_chars_result is not. - Code snippets wrap, which makes them difficult to read.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Would it be too hard to provide a header-only way of consuming this library? Libraries like Boost.Test allow that, but I do not know how that squares with ODR.
Regards, &rzej;
I have not looked into it, but I assume it's possible because other libraries do it. Opened: https://github.com/cppalliance/charconv/issues/136 to track the issue. Matt
Christopher, Thanks, and I should have mentioned my affiliation (technical writer) with C++ Alliance! - Peter On Tue, Jan 23, 2024 at 11:03 PM Matt Borland via Boost < boost@lists.boost.org> wrote:
Would it be too hard to provide a header-only way of consuming this library? Libraries like Boost.Test allow that, but I do not know how that squares with ODR.
Regards, &rzej;
I have not looked into it, but I assume it's possible because other libraries do it. Opened: https://github.com/cppalliance/charconv/issues/136 to track the issue.
Matt _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Thanks, and I should have mentioned my affiliation (technical writer) with> C++ Alliance! - Peter Thank you Peter. Understood. This is noted. On Wednesday, January 24, 2024 at 06:46:40 PM GMT+1, Peter Turcan via Boost
wrote:
Christopher, Thanks, and I should have mentioned my affiliation (technical writer) with C++ Alliance! - Peter On Tue, Jan 23, 2024 at 11:03 PM Matt Borland via Boost < boost@lists.boost.org> wrote:
Would it be too hard to provide a header-only way of consuming this library? Libraries like Boost.Test allow that, but I do not know how that squares with ODR.
Regards, &rzej;
I have not looked into it, but I assume it's possible because other libraries do it. Opened: https://github.com/cppalliance/charconv/issues/136 to track the issue.
Matt _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Matt Borland wrote:
Would it be too hard to provide a header-only way of consuming this library? Libraries like Boost.Test allow that, but I do not know how that squares with ODR.
Regards, &rzej;
I have not looked into it, but I assume it's possible because other libraries do it. Opened: https://github.com/cppalliance/charconv/issues/136 to track the issue.
To repeat what I said in a comment on the issue, this is not a good idea, at all. "Optional" header-only mode doesn't work well for libraries whose primary purpose is to be used by other libraries. As an example, suppose JSON wants to use Charconv; it now has to decide whether to use it header-only, or compiled. But then suppose that a project uses JSON and another library that uses Charconv, or Charconv directly; all of these dependencies MUST pick the same mode of "consuming" Charconv, and you don't really have any control over that. "Optional" header-only modes create nothing but trouble. Pick one "mode" and stick with it. (And, while header-only may superficially be more convenient, it's not quite the right fit for libraries with, e.g. large lookup tables.)
pon., 22 sty 2024 o 21:58 Ruben Perez via Boost
Hi all,
This is my review of the proposed Boost.Charconv.
- What is your evaluation of the documentation?
I find it really difficult to navigate. Fortunately, it closely mimics the standard, so I end up just looking at cppreference. But still, my experience hasn’t been good. These are the main pain points I see:
- It's a single page. Even for a small library, it makes navigation very cumbersome. - It’s not linked. If you want to find the reference for a struct or function, your only option is to Ctrl+F. - The overview for each function is written like a reference. But then you have another reference at the bottom. - The sidebar TOC has too many levels.
Interestingly, I found this comment in the implementation: https://github.com/cppalliance/charconv/blob/develop/include/boost/charconv/... And I feel it describes way nicer what the library (at least this function) does. Regards, &rzej;
As much as I want to get rid of quickbook, my feeling is that quickbook-based docs currently generate more usable output than asciidoc-based ones. It may be a matter of tuning templates and developing additional tools, but we’re not there yet.
Please find some additional minor comments at the end of the review.
- Did you try to use the library? With what compiler? Did you have any problems?
I’ve integrated the library into Boost.MySQL, in a feature branch (https://github.com/boostorg/mysql/tree/feature/boost-charconv). I’ve fully implemented query formatting and updated the parsing functions, all using the proposed library. I’ve had no surprises. All CIs pass all tests, which perform considerable coverage on the parsing side. This includes different versions of the major compilers and operating systems, with different configurations - see https://github.com/boostorg/mysql/blob/feature/boost-charconv/.drone.star and https://github.com/boostorg/mysql/blob/feature/boost-charconv/.github/workfl... for details.
I’ve found a problem when linking to the library using the CMake targets, where a missing flag causes linker errors - see https://github.com/cppalliance/charconv/issues/130.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
I’ve dedicated around 15 hours of work to integrate the library in Boost.MySQL and clean up CIs.
- Are you knowledgeable about the problem domain?
I’m just a user. I understand the domain problem and the different alternatives. I don’t have any knowledge about the algorithms’ internal workings.
- Your decision.
I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the condition that proper fuzzing is added to the test suite. I’ve worked with libfuzzer before and can assist the author if required.
Thanks Matt for submitting the library, and Christopher for being review manager.
Regards, Ruben.
Minor comments on documentation:
- The first example uses to_chars(... sizeof(buffer) - 1), which seems to imply a NULL-terminator somewhere. Since there is none, I think it should just be sizeof(buffer). - Docs don't specify the behavior about NULL-terminators. Either specify "adheres to standard" or clarify that no NULL-terminator is written. - IMO the front page should clearly state that this library is compliant with the charconv standard header, noting any exceptions. - from_chars overview is missing semicolons in the friends section. - Please use monospace in the discussion to refer to identifiers, like [first, last). - Basic usage: from_chars name is qualified but from_chars_result is not. - Code snippets wrap, which makes them difficult to read.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Christopher, Here is my review of the Charconv documentation - hope it helps! - Peter T Charconv Library Documentation review ================================ The documentation is a good start. Main issues: 1. Needs a full overview. 2. Seems like an API Reference section should be organized, combining the two from_chars/to_chars sections into one, with API standard type sections and subheadings. Other issues: 1. Make code/sections clearer with introductory sentences. 2. Use tables for tabular data, rather than bulleted lists. 3. All mentions of an API element should link to the page/section on that element. 4. Other notes and suggestions below. OVERVIEW: There are three questions to be answered with API docs: What, Why, How - in that order. The Why question is the most important and usually the most neglected. The "What" question ("What makes up the Boost.Charconv library"). For example: "Boost.Charconv converts character buffers to numbers, and numbers to character buffers. It is a small library of two overloaded functions to do the heavy lifting, plus several supporting enums, structures, templates, and constants, with a particular focus on performance and consistency across the supported development environments." The overview really needs to answer the "Why" question: "Why should I [a C++ developer new to Boost] be interested in this library?". There should be several paragraphs explaining how it compares with the standard library and why use this one instead (or as well as perhaps?), and what scenarios the library is most suited for. *A developer who knows nothing about the library should know at the end of the overview whether this is for them or not.* If a claim is made, such as "non-allocating", this should be explained with at least one sentence (no use of malloc perhaps?), and there should be a note of whether there are exceptions. If there are one or two exceptions then they can be mentioned by name in the overview. If there are more, add a sentence that there are exceptions to a general rule, and instead link to a section listing and explaining those differences. Is the charconv standard library the opposite: locale-dependent, allocating, or throwing - if so point that out. The overview should be upbeat in tone, and should clearly mention performance. Dependencies It is good practice to mention any significant dependencies - on Boost or other libraries - as part of the overview or part of the initial setup process. It is good practice too to mention that there are no significant dependencies. The rest of the doc should answer the final question, the "How" question: Usage Examples Add a sentence stating what the examples do (before each and every code block), such as "The following examples show a straightforward use of the two core functions of the library, from_chars and to_chars, converting a character string to a number (integer or floating point) and converting a number to a character string. Note that the character string buffer can be of any length." Supported Compilers Good - clear. What about supported Operating Systems? What about unsupported compilers - does that mean the lib will not work, or that the lib has not been tested on other compilers? If the latter, perhaps suggest that it is up to the developer to verify if the lib works correctly on an unsupported compiler? Would running the benchmarks on an unsupported compiler do the trick? Why use Boost.Charconv over <charconv>? Move this up to the Overview with more details - change "several times faster" to some specific examples. For example using the Boost.Charconv:from_chars_result function with floating point numbers can work from 3.23 to 5.77 times faster than the standard library. OK to link to your benchmarks for more details, but don't require it to answer the Why question. And are there any areas where this library performs better generally - floating point numbers, long integers, anything else - then mention this in the Overview. Building the Library (perhaps rename to "Getting Started" as it involves more than just building) Introduce the section, start with a sentence such as: "The following section explains how to download and build this library locally on your development machine." B2 What if the developer wants to use C++13, does that mean cxxstd=13 should be set, or should it always be set to 11. Explain any options. vcpkg Change "Any required Boost packages that do not already exist will be installed automatically." to "Any required Boost packages not currently installed in your development environment will be installed automatically." Add the TIP code into the tip itself, if possible, so its indentation is correct. from_chars Add a general title before this something like "API Reference" with an introductory sentence (though see the notes on Organization below): "This section describes all the functions, structures, enums, constants, and macros available in the library." This is generally a good order to describe an API reference. Add some text to describe what the source code does or is. Is this the source code of the function in the library, or perhaps a summarized version of its definition? Perhaps add a subheading "Definition" or "Syntax". from_chars_result The bulleted list of error codes would be better presented as a table with sub-heading "Return Values" and two columns entitled: "Value" and "Description" (or similar). This format should be re-used by to_chars_result. from_chars [ don't reuse a heading string at the top level] Parameters (or from_chars Parameters) might be a better title here, or re-organize as suggested below. Again, the parameters would be better represented by a table rather than bullets, with sub-heading Parameters and columns "Name" and "Description". This format should be re-used by to_chars. "One known exception is GCC 5 which does not support constexpr comparison of const char*." - do you have a workaround or suggestion on how to handle this? Consider more descriptive headings for: from_chars for integral types from_chars for floating point types - perhaps Usage notes for from_chars with integral types Usage notes for from_chars with floating point types Examples Integral, Floating Point, Hexadecimal - add a sentence preceding each code block describing what is going on, and any particular nuances to be aware of Same for the invalid arguments - always add a descriptive sentence preceding code blocks explaining what is going on. to_chars - same notes as for from_chars chars_format Add an intro sentence, noting that this is an enum defining the supported number formats. Limits Add an intro sentence describing what is in this section, and that Limits are templates (or reorganize as suggested below). Same notes as above - explain what the Examples are showing, plus nuances Reference Not sure what is meant by "Reference" as the above sections appear to be API reference material. Seems to be a lot of duplication of information from the above section? Perhaps consider combining the two sections into a single reference? References can benefit from being divided up into sections: Functions, Structures, Enums, Macros, Constants, etc. This makes it clear (from looking at the table of contents) what the complete contents of the lib are. Benchmarks Provide a sentence/paragraph describing the purpose of this section. Something like: "This section describes a range of performance benchmarks that have been run comparing this library with the standard library, and how to run your own benchmarks if required." Might need some explanation for libdouble-conversion and {fmt} - such as "Download the zip file and install the code to your development project" - or whatever steps are necessary. Like the use of tables. Is this enough information "to_chars floating point with the shortest representation" - what is meant by "the shortest representation"? Perhaps make it clearer the order of the numbers in Relative Performance - standard charconv first, then Boost.Charconv second? Just to be completely clear. Acknowledgements Perhaps move "Special thanks to Stephan T. Lavavej for providing the basis for the benchmarks." to an Acknowledgements section, so you can add more names as appropriate (testers, design feedback, etc.). A bulleted list works here. Sources Perhaps add an introductory sentence here - not sure that you copied any algorithms from these but more inspiration and ideas? Maybe: "The authors acknowledge the inspiration and guidelines from the following published works." - or similar Missing Information?: What about defined constants, such as BOOST_CHARCONV_RUN_BENCHMARKS, BOOST_CHARCONV_CONSTEXPR - should there be a table of constants defined by the library - with an explanation of the use of each? ORGANIZATION and NAVIGATION Ideally, I would organize the doc into pages (each [page] indicates a new html page). And ideally the pages are well linked together, for example, each time from_chars_result is mentioned, it is a link to the page for that structure. Same for all entries of the functions, structs, enums, etc. Overview [page] Intro - answer the What and Why questions Supported compilers/OS Getting Started [page] Downloading and building Dependencies Basic usage examples [page] from_chars examples to_chars examples API Reference [page] toc (table of contents: Functions, Structures, Enumerations, Templates, Constants, in a table of links) Functions [page] toc (table of contents: from_chars, to_chars) from_chars [page] overview definition/syntax (include the library path where the function is defined) parameters return value remarks/notes [add lower level headings for integers, floating point, hexadecimal etc. as needed] examples to_chars [page] overview definition parameters return value remarks/notes examples Structures [page] toc from_chars_result [page] overview definition fields remarks/notes to_chars_result [page] overview definition fields remarks/notes Enumerations [page] toc chars_format [page] definition members remarks/notes Templates [page] toc Limits [page] overview definition remarks/notes Constants [page] table of defined constants with descriptions Benchmarks toc How to run... Linux Windows MacOS Sources Acknowledgements copyright & license On Tue, Jan 23, 2024 at 11:03 AM Andrzej Krzemienski via Boost < boost@lists.boost.org> wrote:
pon., 22 sty 2024 o 21:58 Ruben Perez via Boost
napisał(a): Hi all,
This is my review of the proposed Boost.Charconv.
- What is your evaluation of the documentation?
I find it really difficult to navigate. Fortunately, it closely mimics the standard, so I end up just looking at cppreference. But still, my experience hasn’t been good. These are the main pain points I see:
- It's a single page. Even for a small library, it makes navigation very cumbersome. - It’s not linked. If you want to find the reference for a struct or function, your only option is to Ctrl+F. - The overview for each function is written like a reference. But then you have another reference at the bottom. - The sidebar TOC has too many levels.
Interestingly, I found this comment in the implementation:
https://github.com/cppalliance/charconv/blob/develop/include/boost/charconv/...
And I feel it describes way nicer what the library (at least this function) does.
Regards, &rzej;
As much as I want to get rid of quickbook, my feeling is that quickbook-based docs currently generate more usable output than asciidoc-based ones. It may be a matter of tuning templates and developing additional tools, but we’re not there yet.
Please find some additional minor comments at the end of the review.
- Did you try to use the library? With what compiler? Did you have any problems?
I’ve integrated the library into Boost.MySQL, in a feature branch (https://github.com/boostorg/mysql/tree/feature/boost-charconv). I’ve fully implemented query formatting and updated the parsing functions, all using the proposed library. I’ve had no surprises. All CIs pass all tests, which perform considerable coverage on the parsing side. This includes different versions of the major compilers and operating systems, with different configurations - see
https://github.com/boostorg/mysql/blob/feature/boost-charconv/.drone.star
and
https://github.com/boostorg/mysql/blob/feature/boost-charconv/.github/workfl...
for details.
I’ve found a problem when linking to the library using the CMake targets, where a missing flag causes linker errors - see https://github.com/cppalliance/charconv/issues/130.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
I’ve dedicated around 15 hours of work to integrate the library in Boost.MySQL and clean up CIs.
- Are you knowledgeable about the problem domain?
I’m just a user. I understand the domain problem and the different alternatives. I don’t have any knowledge about the algorithms’ internal workings.
- Your decision.
I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the condition that proper fuzzing is added to the test suite. I’ve worked with libfuzzer before and can assist the author if required.
Thanks Matt for submitting the library, and Christopher for being review manager.
Regards, Ruben.
Minor comments on documentation:
- The first example uses to_chars(... sizeof(buffer) - 1), which seems to imply a NULL-terminator somewhere. Since there is none, I think it should just be sizeof(buffer). - Docs don't specify the behavior about NULL-terminators. Either specify "adheres to standard" or clarify that no NULL-terminator is written. - IMO the front page should clearly state that this library is compliant with the charconv standard header, noting any exceptions. - from_chars overview is missing semicolons in the friends section. - Please use monospace in the discussion to refer to identifiers, like [first, last). - Basic usage: from_chars name is qualified but from_chars_result is not. - Code snippets wrap, which makes them difficult to read.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
Here is my review of the Charconv documentation - hope it helps!
Thank you Peter for your very clear, detailedand informative review of the documentationof the proposed Boost.Charconv.
It helps very much.
Christopher
On Wednesday, January 24, 2024 at 04:51:48 AM GMT+1, Peter Turcan via Boost
pon., 22 sty 2024 o 21:58 Ruben Perez via Boost
napisał(a): Hi all,
This is my review of the proposed Boost.Charconv.
- What is your evaluation of the documentation?
I find it really difficult to navigate. Fortunately, it closely mimics the standard, so I end up just looking at cppreference. But still, my experience hasn’t been good. These are the main pain points I see:
- It's a single page. Even for a small library, it makes navigation very cumbersome. - It’s not linked. If you want to find the reference for a struct or function, your only option is to Ctrl+F. - The overview for each function is written like a reference. But then you have another reference at the bottom. - The sidebar TOC has too many levels.
Interestingly, I found this comment in the implementation:
https://github.com/cppalliance/charconv/blob/develop/include/boost/charconv/...
And I feel it describes way nicer what the library (at least this function) does.
Regards, &rzej;
As much as I want to get rid of quickbook, my feeling is that quickbook-based docs currently generate more usable output than asciidoc-based ones. It may be a matter of tuning templates and developing additional tools, but we’re not there yet.
Please find some additional minor comments at the end of the review.
- Did you try to use the library? With what compiler? Did you have any problems?
I’ve integrated the library into Boost.MySQL, in a feature branch (https://github.com/boostorg/mysql/tree/feature/boost-charconv). I’ve fully implemented query formatting and updated the parsing functions, all using the proposed library. I’ve had no surprises. All CIs pass all tests, which perform considerable coverage on the parsing side. This includes different versions of the major compilers and operating systems, with different configurations - see
https://github.com/boostorg/mysql/blob/feature/boost-charconv/.drone.star
and
https://github.com/boostorg/mysql/blob/feature/boost-charconv/.github/workfl...
for details.
I’ve found a problem when linking to the library using the CMake targets, where a missing flag causes linker errors - see https://github.com/cppalliance/charconv/issues/130.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
I’ve dedicated around 15 hours of work to integrate the library in Boost.MySQL and clean up CIs.
- Are you knowledgeable about the problem domain?
I’m just a user. I understand the domain problem and the different alternatives. I don’t have any knowledge about the algorithms’ internal workings.
- Your decision.
I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the condition that proper fuzzing is added to the test suite. I’ve worked with libfuzzer before and can assist the author if required.
Thanks Matt for submitting the library, and Christopher for being review manager.
Regards, Ruben.
Minor comments on documentation:
- The first example uses to_chars(... sizeof(buffer) - 1), which seems to imply a NULL-terminator somewhere. Since there is none, I think it should just be sizeof(buffer). - Docs don't specify the behavior about NULL-terminators. Either specify "adheres to standard" or clarify that no NULL-terminator is written. - IMO the front page should clearly state that this library is compliant with the charconv standard header, noting any exceptions. - from_chars overview is missing semicolons in the friends section. - Please use monospace in the discussion to refer to identifiers, like [first, last). - Basic usage: from_chars name is qualified but from_chars_result is not. - Code snippets wrap, which makes them difficult to read.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Mon, Jan 22, 2024 at 12:58 PM Ruben Perez via Boost < boost@lists.boost.org> wrote:
The biggest barrier I have for adoption is that it’s a compiled library. MySQL is header-only, which means that the official CMake modules in the release don’t contain a Boost::mysql target. Adding a hard dependency on Boost.Charconv is thus a breaking change for my users, who will need to update their CMLs.
My experience in Boost has taught me that this is probably a good thing. Users depending on libraries to be header-only is an impediment to creating great libraries. The only thing worse than not having the ability or motivation to compile and consume an external library is the absurd requirement to "only have the standard library as a dependency." I believe that Boost should lead by example here. Libraries should be header-only as a coincidence and not by design. For example Boost.Variant2 should be header-only. But Boost.MySQL does not need to be header-only. In my opinion the best way to motivate people to learn how to properly compile and link external dependencies is to create non-header-only Boost libraries that are so compelling in features, so rewarding in value, that users will accept the cost of incorporating linkable external dependencies instead of forgoing useful libraries. Thanks
Vinnie Falco wrote:
I believe that Boost should lead by example here. Libraries should be header- only as a coincidence and not by design. For example Boost.Variant2 should be header-only. But Boost.MySQL does not need to be header-only. In my opinion the best way to motivate people to learn how to properly compile and link external dependencies is to create non-header-only Boost libraries that are so compelling in features, so rewarding in value, that users will accept the cost of incorporating linkable external dependencies instead of forgoing useful libraries.
Interestingly, compiled libraries are in a way more convenient under CMake. That's because for compiled libraries, there's no difference between find_package and add_subdirectory/FetchContent use, one always links to e.g. Boost::mysql, whereas header-only libraries don't have a corresponding target in the find_package case.
On Wed, Jan 24, 2024 at 7:32 PM Peter Dimov via Boost
Interestingly, compiled libraries are in a way more convenient under CMake.
It is nice when a library makes it easy for you to write your own build script. The "src.cpp" (including everything) model had that going for it. For simple libraries it is almost possible to just compile all files in src/*.cpp recursively. Except for the macro such as (for example) BOOST_URLS_SOURCE. This is needed to set the DLL import/export macros correctly. But maybe this could be added at the top of each cpp file: #define BOOST_URLS_SOURCE Then this could be taken out of the build script and it would be easier for users to write their own build script. I haven't thought this through completely so there could be a flaw in it. Also, libraries that have very complicated build scripts such as the need to set even more macros or provide a bunch of configurations, would be harder for users to write build scripts for. Thanks
I believe that Boost should lead by example here. Libraries should be header-only as a coincidence and not by design. For example Boost.Variant2 should be header-only. But Boost.MySQL does not need to be header-only. In my opinion the best way to motivate people to learn how to properly compile and link external dependencies is to create non-header-only Boost libraries that are so compelling in features, so rewarding in value, that users will accept the cost of incorporating linkable external dependencies instead of forgoing useful libraries.
I am completely with you, and I almost switched to compiled-only a couple of releases ago. There are two points that held me back. If anyone in this ML can shed some light on how they could be solved, I'd be eternally grateful. 1. Boost.Asio and the ODR rule. ======================== As you may already know, Boost.Asio is header only and has 238329282390 config macros that modify its definitions. AFAIK all usages of Boost.Asio in your executable should employ the same config macros. For instance, BOOST_ASIO_SEPARATE_COMPILATION is used to switch some declarations from inline to external linkage, so they can be built in a separate TU. BOOST_ASIO_USE_TS_EXECUTOR_AS_DEFAULT is used to enable some legacy behavior on executors. Unfortunately, I have real users enabling this switch because they need it to be compatible with other libraries. If I happen to build a libboost_mysql.so binary, which version of Boost.Asio should I use? It will make my library compatible only with that version. I "solved" this by providing a separate compilation mode, where you have a src.hpp, like Beast does. This provides reductions from 50% to 75% in build times, so it's not a minor thing. Making this the default via making the library compiled by default would be great. 2. OpenSSL ========= This is currently a hard dependency, because even in plaintext mode, the protocol mandates some hashing. B2 openssl package provides far less heuristics than CMake's, specially for Windows. My users would need much more effort to build than what's needed currently. Additionally, there's currently a limitation in boost_install for depending on OpenSSL. This last point is likely workaroundable on my side, since all these problems have solutions. But I won't make the effort until point 1. gets a solution (if there is any). Thanks, Ruben.
On Thu, Jan 25, 2024 at 5:14 AM Ruben Perez
I am completely with you
\phew... glad to hear it :) I was worried I was going out on a limb.
If I happen to build a libboost_mysql.so binary, which version of Boost.Asio should I use? It will make my library compatible only with that version.
I forgot about this, and now that you mention it I will have to study my new libraries (Http.Io and Websocket.Io) to figure out how much can be compiled separately. The answer to your question is simple though: any code whose compilation would be affected by Asio's configuration macros must remain in headers. This means that the amount of code that is compiled into the static or shared library is going to be small. My approach for tackling this is to break up my libraries into two parts. A "protocol" library (also called "sans-io": https://sans-io.readthedocs.io/how-to-sans-io.html) which is fully compiled and an "i/o" library which uses Asio and the protocol library. The io library is still compiled but there is more header-only material. I strive to minimize the amount of I/O-specific code. This is process is more disciplined by having the protocol library be physically separated. This is currently a hard dependency, because even in plaintext mode,
the protocol mandates some hashing. B2 openssl package provides far less heuristics than CMake's, specially for Windows. My users would need much more effort to build than what's needed currently. Additionally, there's currently a limitation in boost_install for depending on OpenSSL.
Okay well it sounds like your library is already not "header-only" in practice... LOL. If you require linking to OpenSSL then you are paying all the costs of header-only development without getting the full benefits. Let me make this clear: * We (C++ library authors collectively) MUST make OpenSSL easy to install, with NO friction for users No more babysitting, mollycoddling, or infantilizing users, and no more shirking our responsibility. OpenSSL is a key piece of network technology; we need to tackle it head-on, tame it, and make sure that anyone who needs to reach for it in their project can do so without hassle. Thanks
On 1/25/24 16:25, Vinnie Falco via Boost wrote:
* We (C++ library authors collectively) MUST make OpenSSL easy to install, with NO friction for users
No more babysitting, mollycoddling, or infantilizing users, and no more shirking our responsibility. OpenSSL is a key piece of network technology; we need to tackle it head-on, tame it, and make sure that anyone who needs to reach for it in their project can do so without hassle.
Not that I oppose the sentiment of making life easier to users, but OpenSSL is a project of its own, with a community of developers around it. Making their library easier to consume should be their goal, not everyone else's. If you have complaints or suggestions on this front, you should probably take them to the OpenSSL developers.
On Thu, Jan 25, 2024 at 5:41 AM Andrey Semashev via Boost < boost@lists.boost.org> wrote:
Not that I oppose the sentiment of making life easier to users, but OpenSSL is a project of its own, with a community of developers around it. Making their library easier to consume should be their goal, not everyone else's. If you have complaints or suggestions on this front, you should probably take them to the OpenSSL developers.
I am not complaining about OpenSSL in particular, and the "we" to which I refer is the community of C++ library authors writ large. However in this case there is a specific pain point with respect to OpenSSL and b2 which for better or worse is our responsibility and not OpenSSL's. We could debate who is responsible for complaints or suggestions but at the end of the day users don't care; they need something that works. If an upstream provider can't get it together, pointing the finger and saying "it's their fault" is unhelpful. Thanks
On 1/25/24 16:45, Vinnie Falco wrote:
On Thu, Jan 25, 2024 at 5:41 AM Andrey Semashev via Boost
mailto:boost@lists.boost.org> wrote: Not that I oppose the sentiment of making life easier to users, but OpenSSL is a project of its own, with a community of developers around it. Making their library easier to consume should be their goal, not everyone else's. If you have complaints or suggestions on this front, you should probably take them to the OpenSSL developers.
I am not complaining about OpenSSL in particular, and the "we" to which I refer is the community of C++ library authors writ large. However in this case there is a specific pain point with respect to OpenSSL and b2 which for better or worse is our responsibility and not OpenSSL's.
We could debate who is responsible for complaints or suggestions but at the end of the day users don't care; they need something that works. If an upstream provider can't get it together, pointing the finger and saying "it's their fault" is unhelpful.
It's not about pointing fingers, it's about fixing the problem where the problem really is. If this problem is with b2 then let the b2 team work on it (assuming they recognize the problem). If the problem is with OpenSSL in general then no amount of work on b2 or Boost will fix it.
Andrey Semashev via Boost
If the problem is with OpenSSL in general then no amount of work on b2 or Boost will fix it.
If the problem is with the OpenSSL's build (as I believe is the case), then you could (1) port it to use b2 as the build system and (2) add a package management solution to b2 so that it's easy to get automatically as a dependency. We, for example, have done (1) for build2 and now OpenSSL can be easily built[1] from source for all the major platforms/compilers and used as a dependency in various other packages that use (or were ported to) build2, such as libasio, libevent, libcurl, libboost-mysql, etc. It was a lot of work, is a pain to maintain, and we haven't yet packages OpenSSL 3, but it's doable. You can complain all you want to the OpenSSL project or implore the larger C++ library community to fix the issue, but unless and until someone actually does the work, it will all be just talk. [1] https://cppget.org/?builds=libssl
I forgot about this, and now that you mention it I will have to study my new libraries (Http.Io and Websocket.Io) to figure out how much can be compiled separately. The answer to your question is simple though: any code whose compilation would be affected by Asio's configuration macros must remain in headers. This means that the amount of code that is compiled into the static or shared library is going to be small. My approach for tackling this is to break up my libraries into two parts. A "protocol" library (also called "sans-io": https://sans-io.readthedocs.io/how-to-sans-io.html) which is fully compiled and an "i/o" library which uses Asio and the protocol library. The io library is still compiled but there is more header-only material. I strive to minimize the amount of I/O-specific code. This is process is more disciplined by having the protocol library be physically separated.
Actually, MySQL did this some months ago. There is no physical separation of libraries, and the sansio part is internal, but I'm a disciplined author, and the change was one of the best things I did. Yet, you need to measure. What takes time to build is instantiating ssl::stream async methods. While keeping the protocol in source helps, you won't get real benefits until you tackle this problem. MySQL is migrating away from the fully-templated connection to a type-erased model (I've measured and efficiency isn't really affected - the best way to optimize is to provide things like pipelines and batch inserts). MySQL now uses any_completion_handler for this, and manages to cut times pretty well. It will all be cleaner when the new type-erased connections become the default.
Okay well it sounds like your library is already not "header-only" in practice... LOL. If you require linking to OpenSSL then you are paying all the costs of header-only development without getting the full benefits. Let me make this clear:
It's not. As I said, I've been wanting to get rid of header-onlyness for a lot of time. But, there's Boost.Asio :)
* We (C++ library authors collectively) MUST make OpenSSL easy to install, with NO friction for users
My experience with this was: - If you're using CMake, find_package(OpenSSL REQUIRED) is good, and (most) users will do it without a problem. - If you're doing the equivalent from b2, things work in Linux but not in Windows, where additional configuration on the user side is required. As I mentioned, this has no relevance if the Asio point doesn't have a solution, as you seem to imply.
pon., 15 sty 2024 o 11:31 Christopher Kormanyos via Boost < boost@lists.boost.org> napisał(a):
Dear all,
The review of Boost.Charconv by Matt Borland runs Monday, January 15th through January 25th, 2024.
Code: https://github.com/cppalliance/charconvDocs: https://master.charconv.cpp.al/Review Schedule: https://www.boost.org/community/review_schedule.html
Thank you for considering this review.
Please optionally provide feedback on the followinggeneral topics:
- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?
Christopher KormanyosBoost.Charconv Review Manager
Not a review, but some observations about docs. It looks like `limits` is something that every responsible usage of to_chars should use. It should be introduced in all the initial examples of to_chars. https://master.charconv.cpp.al/#limits_examples -- these examples do not compile (missing closing square bracket). Are there some requirements on how the numbers are converted to characters? How is it that 1.03 is rendered as "1.03" when there are surely longer sequences of characters that are closer to the numeric value really stored in the object of type `double`. The docs do not make it clear if a trailing zero is appended to the produced output. `std::to_chars` does not append the nul-terminator. If boost::to_chars does the same, the examples in the docs that use strcmp cannot be right. (You are not zeroing the buffer initially). Generally I feel Boost needs a library that does this (to_chars for C++11). I just do not feel qualified to review the implementation. However, using `malloc` in the corner cases ( https://github.com/cppalliance/charconv/blob/master/include/boost/charconv/d...) and using word "non-allocating" in the docs doesn't seem right. You could say in the docs what corner cases these are. Maybe it makes sense to allocate in those cases. But you cannot say they are "non-allocating". Regards, &rzej;
śr., 24 sty 2024 o 22:27 Andrzej Krzemienski
pon., 15 sty 2024 o 11:31 Christopher Kormanyos via Boost < boost@lists.boost.org> napisał(a):
Dear all,
The review of Boost.Charconv by Matt Borland runs Monday, January 15th through January 25th, 2024.
Code: https://github.com/cppalliance/charconvDocs: https://master.charconv.cpp.al/Review Schedule: https://www.boost.org/community/review_schedule.html
Thank you for considering this review.
Please optionally provide feedback on the followinggeneral topics:
- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?
Christopher KormanyosBoost.Charconv Review Manager
Not a review, but some observations about docs.
It looks like `limits` is something that every responsible usage of to_chars should use. It should be introduced in all the initial examples of to_chars.
https://master.charconv.cpp.al/#limits_examples -- these examples do not compile (missing closing square bracket).
Let me re-post this question, as I didn't get the answer, but I find it important in the context of this review:
Are there some requirements on how the numbers are converted to characters? How is it that 1.03 is rendered as "1.03" when there are surely longer sequences of characters that are closer to the numeric value really stored in the object of type `double`.
If we treated the requirement to represent the printed number as accurately as possible, then surely "1.029999998" would be a closer approximation of the value being stored, no? Regards, &rzej;
The docs do not make it clear if a trailing zero is appended to the produced output. `std::to_chars` does not append the nul-terminator. If boost::to_chars does the same, the examples in the docs that use strcmp cannot be right. (You are not zeroing the buffer initially).
Generally I feel Boost needs a library that does this (to_chars for C++11). I just do not feel qualified to review the implementation.
However, using `malloc` in the corner cases ( https://github.com/cppalliance/charconv/blob/master/include/boost/charconv/d...) and using word "non-allocating" in the docs doesn't seem right. You could say in the docs what corner cases these are. Maybe it makes sense to allocate in those cases. But you cannot say they are "non-allocating".
Regards, &rzej;
Are there some requirements on how the numbers are converted to characters? How is it that 1.03 is rendered as "1.03" when there are surely longer sequences of characters that are closer to the numeric value really stored in the object of type `double`.
If we treated the requirement to represent the printed number as accurately as possible, then surely "1.029999998" would be a closer approximation of the value being stored, no?
If the number can not be exactly represented tie-breaking is done by rounding to the nearest number with even significand. For example 9000000000000000.5 has the nearest two floating point values of 9000000000000000, or 9000000000000001 so we pick the former since it is even. See page 2: https://arxiv.org/pdf/2101.11408.pdf. Matt
pt., 26 sty 2024 o 08:30 Matt Borland
Are there some requirements on how the numbers are converted to characters? How is it that 1.03 is rendered as "1.03" when there are surely longer sequences of characters that are closer to the numeric value really stored in the object of type `double`.
If we treated the requirement to represent the printed number as accurately as possible, then surely "1.029999998" would be a closer approximation of the value being stored, no?
If the number can not be exactly represented tie-breaking is done by rounding to the nearest number with even significand. For example 9000000000000000.5 has the nearest two floating point values of 9000000000000000, or 9000000000000001 so we pick the former since it is even. See page 2: https://arxiv.org/pdf/2101.11408.pdf.
Thanks. But the document describes the parsing part, where you have to make the choice to select a value from a finite set. My question is more about serialization. I am pretty sure that the representation of seemingly simple numbers, like "10.2" is in reality ugly, and the output string has the capability to represent nearly everything, there needs to be some clevernes applied to control the balance between the accuracy, and producing a nice-to-human text as "10.2". My question is, does the specification of `std::to_chars` impose strict, unambiguous rules, or do you have the liberty to choose within some limit. Does my question make sense? Regards, &rzej;
Matt
Thanks. But the document describes the parsing part, where you have to make the choice to select a value from a finite set. My question is more about serialization. I am pretty sure that the representation of seemingly simple numbers, like "10.2" is in reality ugly, and the output string has the capability to represent nearly everything, there needs to be some cleverness applied to control the balance between the accuracy, and producing a nice-to-human text as "10.2". My question is, does the specification of `std::to_chars` impose strict, unambiguous rules, or do you have the liberty to choose within some limit. Does my question make sense? Regards, &rzej;
From the standard for std::to_chars (22.13.2) you have to give the shortest representation (if no precision specified), and "If there are several such representations, the representation with the smallest difference from the floating-point argument value is chosen, resolving any remaining ties using rounding according to round_to_nearest". The behavior of round_to_nearest is defined in IEEE 754 which is also unambiguous. The algorithm used in boost::charconv::to_chars is described here: https://github.com/jk-jeon/dragonbox/blob/master/other_files/Dragonbox.pdf, and the basis code is in the same repository.
I believe that answers the question, but please let me know if it does not. Matt
pt., 26 sty 2024 o 11:48 Matt Borland
Thanks. But the document describes the parsing part, where you have to make the choice to select a value from a finite set. My question is more about serialization. I am pretty sure that the representation of seemingly simple numbers, like "10.2" is in reality ugly, and the output string has the capability to represent nearly everything, there needs to be some cleverness applied to control the balance between the accuracy, and producing a nice-to-human text as "10.2". My question is, does the specification of `std::to_chars` impose strict, unambiguous rules, or do you have the liberty to choose within some limit.
Does my question make sense?
Regards, &rzej;
From the standard for std::to_chars (22.13.2) you have to give the shortest representation (if no precision specified), and "If there are several such representations, the representation with the smallest difference from the floating-point argument value is chosen, resolving any remaining ties using rounding according to round_to_nearest". The behavior of round_to_nearest is defined in IEEE 754 which is also unambiguous. The algorithm used in boost::charconv::to_chars is described here: https://github.com/jk-jeon/dragonbox/blob/master/other_files/Dragonbox.pdf, and the basis code is in the same repository.
1.
I believe that answers the question, but please let me know if it does not.
That's exactly what I was after. Thank you. &rzej;
Matt
# Boost.Charconv review ## What is your evaluation of the design? A fast replacement for std::to_chars/from_chars in C++11 is very useful. I also like that it's a compiled library, even if it causes some temporary inconveniences for integration. I'm still worried about the "non-allocating" initial description in the documentation. It rarely allocates, but it might. Even if only in edge cases and with a function that doesn't throw on allocation failure. So the initial documentation sentence describing the library is misleading. The return value on result_out_of_range is also something to think about. After considering opinions on both sides here and elsewhere, I tend to think we should not deviate from the standard in this case. Deviating from the standard also seems like a headache to the author. I don't like the name from_chars_strict though. Because it's not what the function is doing. The way it returns values on result_out_of_range is not a matter of being more or less strict since it would accept and reject the same inputs. ## What is your evaluation of the implementation? I went through the code and the implementation seems fine and organized. I would like it if it were possible to have no malloc in there. But if that's impossible, documenting the edge cases is also fine. It would be good to have some fuzzing tests in the library. ## What is your evaluation of the documentation? The documentation is small and has lots of examples. I found the "why use Boost.Charconv over <charconv>" section very useful. The initial description in the docs describes it only as "parsing functions". This seems too generic for what the library does. One great thing about boost libraries is that their documentation is usually goal-oriented. The initial sections in the documentation assume the reader knows what <charconv> is but I don't think it should assume that. For instance, there's an initial section on "why use Boost.Charconv over <charconv>" but no initial section on "Why use Boost.Charconv". The "from chars" section doesn't describe the edge cases when it allocates. It could also include examples of these edge cases so the users understand how extreme they are. The floating point example includes `assert(v == 1.2345);`. I thought this kind of floating point equality comparison could fail. The introduction of boost::charconv::limits should be somewhat interleaved with the discussion of from/to_chars because users should already be aware of that when copying/pasting the first examples on how to use these functions. The from/to_chars overview sections and the reference section are a little too repetitive at times. Some repetition is natural here, but it would be good to go through these sections to make sure most of the content in each of the sections adds independent value to the reader. ## What is your evaluation of the potential usefulness of the library? Boost libraries might benefit from that. Some internal functions in Boost.URL could use this library. I believe that's the case for many other libraries because most protocols involve numbers here and there. It's also useful to have this feature without C++17. The library is also faster than std::from/to_chars. ## Did you try to use the library? With what compiler? Did you have any problems? I ran some small tests locally and it worked fine. I haven't tried to integrate it in any other library. I particularly like that the project provided vcpkg and conan packages/ports. ## How much effort did you put into your evaluation? I read the documentation, experimented with it in a small local project, and went through the code. ## Are you knowledgeable about the problem domain? I understand the problem clearly, as most people probably do, but I'm not an expert in terms of the trade-offs between all possible solutions. In other words, I'm just a user. ## Your decision. I recommend to ACCEPT Charconv into Boost. ## Disclaimer I'm affiliated with the C++ Alliance.
I'm still worried about the "non-allocating" initial description in the documentation. It rarely allocates, but it might. Even if only in edge cases and with a function that doesn't throw on allocation failure. So the initial documentation sentence describing the library is misleading.
I am updating the docs to highlight this case.
The return value on result_out_of_range is also something to think about. After considering opinions on both sides here and elsewhere, I tend to think we should not deviate from the standard in this case. Deviating from the standard also seems like a headache to the author.
Between the reviews and slack there seems to be more people for swapping the behavior of from_chars and from_chars_strict. Any opinions on what the new function would be named? Something like from_chars_erange since it from_chars with modified ERANGE handling?
It would be good to have some fuzzing tests in the library.
Since this was a condition of Reuben's acceptance the PR can be found: https://github.com/cppalliance/charconv/pull/134
## What is your evaluation of the documentation?
I am going through Peter's giant write-up on how to make the docs better which I believe will hit all of your points, but I will double check across all the reviews.
I recommend to ACCEPT Charconv into Boost.
Thank you for taking the time to review, and providing feedback. Matt
[snip]
## Your decision.
I recommend to Thank you Alan for yur detailed and clearreview of the proposed Boost.Charconv. Christopher
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[snip][snip]
Dear all,
The review of Boost.Charconv by Matt Borland runs> Monday, January 15th through January 25th, 2024. If you have pending review or comments, pleasesubmit these within the calendar day of 25-January-2024. The review period of the proposed Boost.Charconvends today. Following that, the review report will be preparedand submitted within a timely fashion.
Thank you for your participation _______________________________________________ Unsubscribe & other changes: Boost Info Page | | | | Boost Info Page | | |
With apologies to Matt, this is a somewhat limited review based on limited time, but I hope it's useful non-the-less. I should also declare that I know Matt through his excellent help with the Math library.
Please optionally provide feedback on the followinggeneral topics:
- What is your evaluation of the design?
It is what it is, nothing to say here.
- What is your evaluation of the implementation?
- What is your evaluation of the documentation? I'm reasonably happy I can find what I need with the docs, but would echo the more detailed comments of others. It would be nice if the code snippet at the very start was actually something I could cut and paste and run "as is" though. - What is your evaluation of the potential usefulness of the library? Actually, I'm not sure - it is super useful, but it would interesting to see how quickly native C++20 support gains traction - I can see other
Matt has clearly worked hard to ensure this uses the current state of
the art implementations - the performance figures speak for themselves
here. I'm also impressed with the amount of testing going on - I know
this is a "small" library - at least in it's interface - but testing
this stuff is hard, and pretty vital too.
On the specifics: I agree with the comment about the constants in the
headers, while these could hidden inside inline functions (which would
then presumably get merged at link time), I see no good reason not to
declare them extern and move to a .cpp file, given that we have separate
source anyway.
I agree with the comment about moving implementation only headers to the
src directory.
With regard to behaviour, I can see that we can't please everyone here.
At present, I would weakly vote for bleeding-edge behaviour to be the
default (all pending defects in the std wording fixed), but I can
imagine situations where different projects would have differing
requirements. Thinking out loud here, can we use namespaces to help here:
namespace boost{ namespace charconv{
namespace cxx20{
// C++ 20 interface here
}
namespace latest{
// bleeding edge here
}
using from_chars = whatever;
}}
The "whatever" here, could potentially be macro configured as Matt had
it before. I also prefer a name like "cxx20" to "strict" as strict
compared to which standard? This would also allow users to fix
behaviour by explicitly calling say cxx20::from_chars which would give
them consistent behaviour going forward (whatever future standards do),
and is nicely self documenting too. Am I missing something here, seems
to easy and I'm surprised no one has suggested this?
In from_chars.cpp source, I'm seeing quite a few fairly short functions
https://github.com/cppalliance/charconv/blob/b99d82948918df08d0acdc1c3ef5040...
along with some one line forwarding functions:
https://github.com/cppalliance/charconv/blob/b99d82948918df08d0acdc1c3ef5040...
Is there any particular reason why these are not declared inline in the
headers?
I created a short test program to test every possible value of type
float (this works for float, don't ever try it double!):
#include
- Did you try to use the library? With what compiler? Did you have any problems? Only with latest MSVC. I note that Matt has a comprehensive set of CI tests though, so I'm happy portability is there. It would be nice to see some non-Intel architectures in there just to check the bit-fiddling code, but other than that it's looking good. - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? About 6 or 7 hours, mostly looking at code and running some quick tests. - Are you knowledgeable about the problem domain?
Somewhat, Paul Bristow and I discovered some bugs in std lib floating point to string conversions years ago, so I do know how to torture this kind of code. The actual routines used are all new to me though. And finally, Yes I do think think this should be accepted into Boost, subject to comments (as usual). Best, John.
On the specifics: I agree with the comment about the constants in the headers, while these could hidden inside inline functions (which would then presumably get merged at link time), I see no good reason not to declare them extern and move to a .cpp file, given that we have separate source anyway.
I agree with the comment about moving implementation only headers to the src directory.
I will address these both in post-review re-factoring.
With regard to behaviour, I can see that we can't please everyone here. At present, I would weakly vote for bleeding-edge behaviour to be the default (all pending defects in the std wording fixed), but I can imagine situations where different projects would have differing requirements. Thinking out loud here, can we use namespaces to help here:
namespace boost{ namespace charconv{
namespace cxx20{
// C++ 20 interface here
}
namespace latest{
// bleeding edge here
}
using from_chars = whatever;
}}
The "whatever" here, could potentially be macro configured as Matt had it before. I also prefer a name like "cxx20" to "strict" as strict compared to which standard? This would also allow users to fix behaviour by explicitly calling say cxx20::from_chars which would give them consistent behaviour going forward (whatever future standards do), and is nicely self documenting too. Am I missing something here, seems to easy and I'm surprised no one has suggested this?
The only issue I have with this is "latest" isn't tied to a specific C++ standard but what we think is the best resolution to an issue with the way the standard is written at current. At some point the latest draft of the standard will have a resolution (in theory at least).
In from_chars.cpp source, I'm seeing quite a few fairly short functions https://github.com/cppalliance/charconv/blob/b99d82948918df08d0acdc1c3ef5040... along with some one line forwarding functions: https://github.com/cppalliance/charconv/blob/b99d82948918df08d0acdc1c3ef5040... Is there any particular reason why these are not declared inline in the headers?
I'll hit these too in refactoring.
Only with latest MSVC. I note that Matt has a comprehensive set of CI tests though, so I'm happy portability is there. It would be nice to see some non-Intel architectures in there just to check the bit-fiddling code, but other than that it's looking good.
ARM64 and s390x are in the drone run (https://github.com/cppalliance/charconv/blob/develop/.drone.jsonnet#L135). These both have native 128-bit long doubles, and s390x covers big-endian arch.
Yes I do think think this should be accepted into Boost, subject to comments (as usual).
Thanks as always John. I appreciate the time and feedback. Matt
John Maddock wrote:
On the specifics: I agree with the comment about the constants in the headers, while these could hidden inside inline functions (which would then presumably get merged at link time), I see no good reason not to declare them extern and move to a .cpp file, given that we have separate source anyway.
I haven't looked at the specific constants in question, but if they are used by the constexpr integral functions, you can't put them into a .cpp file.
On 25/01/2024 17:57, Peter Dimov via Boost wrote:
On the specifics: I agree with the comment about the constants in the headers, while these could hidden inside inline functions (which would then presumably get merged at link time), I see no good reason not to declare them extern and move to a .cpp file, given that we have separate source anyway. I haven't looked at the specific constants in question, but if they are used by
John Maddock wrote: the constexpr integral functions, you can't put them into a .cpp file.
Good point! John.
I'm not a reviewer, but I'd like to add a note on the interface: I think boost::charconv should add one convenience function as follows: // Returns std::nullopt on error template <typename NumberType> std::optional<NumberType> boost::charconv::to_number(const std::string_view& sv) noexcept; Reason: Right now there is no simple, non-throwing way of converting a string to a number in C++ as std::stoi/std::stof throws exceptions on errors. Best/Viktor On Mon, Jan 15, 2024 at 11:31 AM Christopher Kormanyos via Boost < boost@lists.boost.org> wrote:
Dear all,
The review of Boost.Charconv by Matt Borland runs Monday, January 15th through January 25th, 2024.
Code: https://github.com/cppalliance/charconvDocs: https://master.charconv.cpp.al/Review Schedule: https://www.boost.org/community/review_schedule.html
Thank you for considering this review.
Please optionally provide feedback on the followinggeneral topics:
- What is your evaluation of the design? - What is your evaluation of the implementation? - What is your evaluation of the documentation? - What is your evaluation of the potential usefulness of the library? - Did you try to use the library? With what compiler? Did you have any problems? - How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? - Are you knowledgeable about the problem domain?
Christopher KormanyosBoost.Charconv Review Manager
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
I'm not a reviewer, but I'd like to add a note on the interface:
I think boost::charconv should add one convenience function as follows:
// Returns std::nullopt on error template <typename NumberType>
std::optional<NumberType> boost::charconv::to_number(const
std::string_view& sv) noexcept;
Reason: Right now there is no simple, non-throwing way of converting a string to a number in C++ as std::stoi/std::stof throws exceptions on errors.
Best/Viktor
Viktor, Thanks for the suggestion. I can add some additional interfaces for to/from number to the library since they are pretty straightforward. Similar comments came up during the review such as supporting string_view instead of pointer pairs. Matt
Viktor Sehr wrote:
I'm not a reviewer, but I'd like to add a note on the interface:
I think boost::charconv should add one convenience function as follows:
// Returns std::nullopt on error template <typename NumberType> std::optional<NumberType> boost::charconv::to_number(const std::string_view& sv) noexcept;
This throws away the error code; it should return (if we stick to standard
facilities)
std::expected
On Wed, Jan 31, 2024 at 1:38 PM Peter Dimov via Boost
Viktor Sehr wrote:
I think boost::charconv should add one convenience function as follows: // Returns std::nullopt on error template <typename NumberType> std::optional<NumberType> boost::charconv::to_number(const std::string_view& sv) noexcept;
This throws away the error code
But sometimes that's all you want. That's kinda the point from Viktor I guess. Not that an `expected` or `outcome` return type is bad; but it is more complex.
[...] to avoid a dependency on
and therefore <string>
Hopefully one day with modules and `import std;` this will be a thing of the past... But that's I guess another completely different can of worms! --DD
Dominique Devienne wrote:
On Wed, Jan 31, 2024 at 1:38 PM Peter Dimov via Boost
mailto:boost@lists.boost.org > wrote: Viktor Sehr wrote:
I think boost::charconv should add one convenience function as follows: // Returns std::nullopt on error template <typename NumberType> std::optional<NumberType> boost::charconv::to_number(const std::string_view& sv) noexcept;
This throws away the error code
But sometimes that's all you want. That's kinda the point from Viktor I guess.
It's actually never what you want the way from_chars is specified, because 0.000[more 0...]001 fails with ERANGE, whereas 1.000[same 0...]001 succeeds and returns 1.0. I don't really know of a use case where this is what one wants.
Not that an `expected` or `outcome` return type is bad; but it is more complex.
It's not any more complex for the user; the interface is the same as optional's if you don't need the error. Returning `optional` is almost always a design mistake.
Hi,
std::expected
Dominique Devienne wrote:
On Wed, Jan 31, 2024 at 1:38 PM Peter Dimov via Boost
mailto:boost@lists.boost.org > wrote: Viktor Sehr wrote: > I think boost::charconv should add one convenience function as follows: > // Returns std::nullopt on error > template <typename NumberType> > std::optional<NumberType> boost::charconv::to_number(const > std::string_view& sv) noexcept;
This throws away the error code
But sometimes that's all you want. That's kinda the point from Viktor I guess.
It's actually never what you want the way from_chars is specified, because 0.000[more 0...]001 fails with ERANGE, whereas 1.000[same 0...]001 succeeds and returns 1.0.
I don't really know of a use case where this is what one wants.
Not that an `expected` or `outcome` return type is bad; but it is more complex.
It's not any more complex for the user; the interface is the same as optional's if you don't need the error.
Returning `optional` is almost always a design mistake.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (15)
-
Alan de Freitas
-
Alexander Grund
-
Andrey Semashev
-
Andrzej Krzemienski
-
Boris Kolpackov
-
Christopher Kormanyos
-
Dominique Devienne
-
John Maddock
-
Matt Borland
-
Peter Dimov
-
Peter Turcan
-
Robert Ramey
-
Ruben Perez
-
Viktor Sehr
-
Vinnie Falco