* Spirit X3 has rules that do not compose well — the attributes produced by a rule can change depending on the context in which you use the rule. * Spirit X3 is missing many of the convenient interfaces to parsers that Spirit 2 had. For instance, you cannot add
plaintext dump as requested: I've been a huge fan of Spirit for many years now. One of my favorite Spirit memories was during two separate job interviews. The first time, I was interviewing with a game company and on my resume, I noted that I had written a simple HTTP server that performed simple geospatial querying, creating a list of cities within a radius of a provided zip code. Here's the link https://github.com/cmazakas/location-service. I remember that I had used Spirit for parsing a CSV and the interviewer was genuinely concerned about me using such a library for such a seemingly small task. It was the main focus of the interview and I think represented a cultural divide. I did not get the job. The second time, I used it as a litmus test for me writing C++ at the company. They saw me use Spirit and still hired me anyway which I took to be a great sign. This time, I got the job. ## Boost.Parser Cutting to the chase, I vote we ACCEPT Parser. To test the library, I've been using a small test project here: https://github.com/cmazakas/parser-review My criteria for acceptance was that I'd be able to write an HTTP/1 parser with feature parity to what is currently being done in the proposed Boost.Http.Proto library. I feel that I was able to achieve this goal and I stopped proceeding because I no longer felt like the library itself would be a blocker. Now, even though I'm voting ACCEPT I'm going to spend most of the review criticizing the library. To me, the value-add of a type-safe parser combinator library is obvious so I won't spend too much time here. Basically, testable, reusable parsers are good and even more good when they elegantly compose as well. ### The Feedback The differences between Spirit.X3 and Parser need to be made much more clear by the documentation, by which I mean: parameters to a parser.
* All versions of Spirit have Unicode support, but it is quite difficult to get working.
These need code samples showing the limitations of X3 and how Parser
actually solves them. I think many,
myself included, aren't overwhelmingly clear on what these differences are
and how Parser solves them
and _why_ this is a good thing.
Though I have attempted X3's Unicode before and I will say, Parser's
unified interface is much better.
For example, I wrote an entire RFC-compliant URL parser in X3, including
Unicode support, for my library
Foxy
https://github.com/cmazakas/foxy/blob/7f4ac0495ad2ed9cd0eca5994743d677ac1d26...
.
---
Not a criticism but Parser's directives are a dramatic improvement over X3.
By these, I'm thinking of
`boost::parser::string_view[]`, `boost::parser::separate[]` and
`boost::parser::merge[]`. These should be
highlighted in the comparison vs X3 as I think they add a good bit of
substantive value.
More praise for the library includes its parse API which is a dramatic
improvement over X3 in terms of
ergonomics. Personally, I can see myself main using `prefix_parse`,
similarly to X3's original parse() but
I think the value of `parse()` itself is obvious and doesn't warrant much
explanation.
---
The library generates a dramatic amount of warnings when compiled with
`-Wall -Wextra -pedantic`. I had
originally forgotten to `include_directorie()` Parser as a `SYSTEM`
directory and was met with 172 warnings.
I think being clean under `-Wall -Wextra -Werror` is reasonable criteria
for acceptance.
---
When writing the parser for the HTTP version line, I ran into a lot of
issues separating the major version
from the minor version when using:
parser::digit >> "." >> parser::digit
It was only when I was lazily browsing the docs that I found the
`separate[]` directive which I think indicates
the library's documentation has poor discoverability.
Ultimately, it was cleaner for me to just do:
// HTTP-version = HTTP-name "/" DIGIT "." DIGIT
// HTTP-name = %x48.54.54.50 ; HTTP
auto const on_version = [](auto &ctx) {
auto const &tup = _attr(ctx);
auto const major = tup[0_c];
auto const minor = tup[1_c];
if (major > 9 || minor > 9) {
_pass(ctx) = false;
return;
}
response::metadata &md = _globals(ctx).md_;
md.version_.major_ = major;
md.version_.minor_ = minor;
};
auto const http_name = parser::lit("HTTP");
auto const http_version =
(http_name >> "/" >> parser::uint_ >> "." >>
parser::uint_)[on_version];
---
The documentation doesn't exactly give users any guidelines about when they
should formally
define rules with attributes vs semantic actions.
This is one of those things where if users don't already have the intuition
built up, using the
library is an exercise in frustration.
Namely, attributes don't compose well when you start nesting rules and want
to have different out
params at different levels of the parsers. I suppose this should be obvious
because the rules themselves
form a hierarchy so the attributes _must_ mirror that.
I found that the best thing to do for my use-case was to abuse
`_globals(ctx)` and semantic actions.
I think a good rule of thumb for users can be expressed as something like:
use rule attributes for simple
grammars where all you really want is just the `std::vector<double>`. For
something complex with domain-defined
validation rules, prefer `_globals(ctx)` and semantic actions for
validating everything.
In general, Parser's semantic actions are a huge improvement over Spirit's.
---
I felt like I came up with working but unsatisfactory solutions to parsing
characters in a specific range.
Namely, I just wrote a simple semantic action around `+parser::char_[f]`
but this made me feel like I was
writing unidiomatic code. I'd appreciate some feedback here:
// reason-phrase = 1*( HTAB / SP / VCHAR / obs-text )
// VCHAR = %x21-7E
auto const on_reason_phrase = [](auto &ctx) {
auto ch = _attr(ctx);
if (ch != '\t' && ch != ' ' && (ch < '\x21' || ch > '\x7e')) {
_pass(ctx) = false;
return;
}
response::metadata &md = _globals(ctx).md_;
std::ranges::subrange
digit` was (it was `std::string`).
---
Dependencies are still a nightmare in C++, so Boost.Parser can be used as a purely standalone library, independent of Boost.
This was funny for me to read because I had a working vcpkg port working for my test project in like 2 minutes. --- The library seems to really emphasize Boost.Hana's `tuple` type but I think in practice, this winds up being a mistake unless there's substantive compile-time savings by using Hana here. Namely, Hana's `tuple` doesn't support structured bindings. Because Parser is a C++17 library, I consider this a somewhat large defect of the interface. What's worse, Hana's literals aren't compatible with Parser's. A huge boon for Hana and its `tuple` is that it supports indexing via `tup[0_c]` but this doesn't work when Parser's literal are involved as it also defines a `_c` so an ambiguity is introduced and compilation fails. Between this and the lack of structured binding support, I think relying on Hana here is a strict downgrade unless there's compelling reasons I'm simply unaware of. https://github.com/boostorg/hana/milestone/4 --- Stuff like `_globals` in the docs would 404 for me. I also think the docs are kind of just incomplete in general and because there's not much cross-referencing, discoverability is hurt so a user has to essentially read the entire docs front-to-back to understand the library which is a large ask for a parsing library. ### Conclusion Overall, I think that even with all of the problems I found in the library, it's still quite good and worthy of being brought into Boost as the successor for Spirit. I don't think anything I found was a deal-breaker and I think in a few iterations of Boost, this library can be something quite great. - Christian