pon., 22 sty 2024 o 21:58 Ruben Perez via Boost
Hi all,
This is my review of the proposed Boost.Charconv.
DISCLAIMER: both me and the author are affiliated with the C++ Alliance. I am writing this review as a Boost author who would like to use this library in the code he maintains.
- What is your evaluation of the potential usefulness of the library?
I’m reviewing this library because I’d like to use it in Boost.MySQL. In short, I think any library that implements a text-based protocol can benefit from Boost.Charconv.
I’m implementing functionality to safely generate client-side queries for Boost.MySQL. Given a user-supplied query format string and some runtime, untrusted values, I need to generate a string containing a SQL query. User-supplied values may be any MySQL built-in types, including integers and floats. While prepared statements already cover part of this, there are cases where they are not practical (e.g. queries with dynamic filters).
This functionality is security-critical. Formatting a query incorrectly can easily lead to SQL injection vulnerabilities, which are extremely dangerous.
Composing queries with ints and floats requires converting them to string in a predictable way. Until now, I’ve considered std::snprintf, boost::lexical_cast and std::to_chars for this task. Both std::snprintf and boost::lexical_cast are locale-dependent, which can lead to nasty results. When formatting the double 4.2, we need to output the string “4.2”. Under some locales, these functions output “4,2”, which is interpreted as a field list by MySQL (potential exploit!). std::to_chars behaves like I want, but isn’t implemented by most of the compilers Boost.MySQL supports (we require C++11).
A similar problem occurs when parsing results. When not using prepared statements, MySQL sends values as strings, so ints and doubles must be parsed. From what I’ve observed, libraries like Boost.Json also face a similar problem, often resorting to implementing their own conversion functions.
Outside the protocol world, the library can be useful for tasks like URL parameter parsing, as it’s common to embed numeric identifiers in URLs.
To sum up, I think this library will be useful to other Boost libraries, and probably to others, too.
- What is your evaluation of the design?
The library is a drop-in replacement for std::to_chars/from_chars, so there is not a lot to say. I think the standard design works well, and this library follows it.
The only point where Boost.Charconv deviates from the standard is when parsing floating point values that are too big or too small to be represented in their type. Charconv modifies the output to +- 0 or +-HUGE_VAL, where the standard says it shouldn’t. I understand this helps because it provides more information to the caller, and it may actually be changed in the standard. The original Boost.Charconv included a macro to configure this behavior, which I think is problematic being a compiled library. From https://github.com/cppalliance/charconv/pull/120, I can see this has been changed. An extra function, from_chars_strict, is included. I think this is the right design.
The biggest barrier I have for adoption is that it’s a compiled library. MySQL is header-only, which means that the official CMake modules in the release don’t contain a Boost::mysql target. Adding a hard dependency on Boost.Charconv is thus a breaking change for my users, who will need to update their CMLs.
Would it be too hard to provide a header-only way of consuming this library? Libraries like Boost.Test allow that, but I do not know how that squares with ODR. Regards, &rzej;
I think that being a compiled library is the right thing, though. This is actually a limitation in our current tooling - there should be a Boost::mysql CMake target. From the conversations I’ve had with the b2 maintainer in the cpplang slack, this will be changing with the new modular Boost structure, so it shouldn’t be a concern. I will defer adoption until these changes are in place, though.
- What is your evaluation of the implementation?
I’ve only had a quick glance. It surprised me finding an instance of malloc/strtold, since these functions violate the “non-allocating, locale-independent” guarantees. I understand this is for an extreme edge case, and that the “locale-independent” guarantee is still provided through internal logic. I don’t know enough about these parsing algorithms to determine whether this is the right decision or not. How much effort would require rolling a hand-rolled strtold alternative for such edge cases? In any case, this can be done once the library is in Boost.
The library seems to not be running any fuzzing at the minute, which I think is quite necessary given the library’s nature.
- What is your evaluation of the documentation?
I find it really difficult to navigate. Fortunately, it closely mimics the standard, so I end up just looking at cppreference. But still, my experience hasn’t been good. These are the main pain points I see:
- It's a single page. Even for a small library, it makes navigation very cumbersome. - It’s not linked. If you want to find the reference for a struct or function, your only option is to Ctrl+F. - The overview for each function is written like a reference. But then you have another reference at the bottom. - The sidebar TOC has too many levels.
As much as I want to get rid of quickbook, my feeling is that quickbook-based docs currently generate more usable output than asciidoc-based ones. It may be a matter of tuning templates and developing additional tools, but we’re not there yet.
Please find some additional minor comments at the end of the review.
- Did you try to use the library? With what compiler? Did you have any problems?
I’ve integrated the library into Boost.MySQL, in a feature branch (https://github.com/boostorg/mysql/tree/feature/boost-charconv). I’ve fully implemented query formatting and updated the parsing functions, all using the proposed library. I’ve had no surprises. All CIs pass all tests, which perform considerable coverage on the parsing side. This includes different versions of the major compilers and operating systems, with different configurations - see https://github.com/boostorg/mysql/blob/feature/boost-charconv/.drone.star and https://github.com/boostorg/mysql/blob/feature/boost-charconv/.github/workfl... for details.
I’ve found a problem when linking to the library using the CMake targets, where a missing flag causes linker errors - see https://github.com/cppalliance/charconv/issues/130.
- How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?
I’ve dedicated around 15 hours of work to integrate the library in Boost.MySQL and clean up CIs.
- Are you knowledgeable about the problem domain?
I’m just a user. I understand the domain problem and the different alternatives. I don’t have any knowledge about the algorithms’ internal workings.
- Your decision.
I recommend CONDITIONALLY ACCEPT Charconv into Boost, under the condition that proper fuzzing is added to the test suite. I’ve worked with libfuzzer before and can assist the author if required.
Thanks Matt for submitting the library, and Christopher for being review manager.
Regards, Ruben.
Minor comments on documentation:
- The first example uses to_chars(... sizeof(buffer) - 1), which seems to imply a NULL-terminator somewhere. Since there is none, I think it should just be sizeof(buffer). - Docs don't specify the behavior about NULL-terminators. Either specify "adheres to standard" or clarify that no NULL-terminator is written. - IMO the front page should clearly state that this library is compliant with the charconv standard header, noting any exceptions. - from_chars overview is missing semicolons in the friends section. - Please use monospace in the discussion to refer to identifiers, like [first, last). - Basic usage: from_chars name is qualified but from_chars_result is not. - Code snippets wrap, which makes them difficult to read.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost