On Thu, Feb 22, 2024 at 4:36 PM Christian Mazakas via Boost
plaintext dump as requested:
Thanks! I was going to have to do it to respond here. :) [snip]
### The Feedback
* Spirit X3 has rules that do not compose well — the attributes produced by a rule can change depending on the context in which you use the rule. * Spirit X3 is missing many of the convenient interfaces to parsers that Spirit 2 had. For instance, you cannot add
The differences between Spirit.X3 and Parser need to be made much more clear by the documentation, by which I mean: parameters to a parser.
* All versions of Spirit have Unicode support, but it is quite difficult to get working.
These need code samples showing the limitations of X3 and how Parser actually solves them. I think many, myself included, aren't overwhelmingly clear on what these differences are and how Parser solves them and _why_ this is a good thing.
Fair enough. The first one I'll need to try to write a good example for, but the basic idea is that the loose matching of a parser's default attribute to a user-provided attribute is not something you can turn off. So when you define a Spirit X3 rule with the attribute type vector<int>, it can still fill in a deque<int>. This means that some errors you write don't get diagnosed until way later than they could, and you can't really write a rule and test it in isolation, because the context in which you use it can change its behavior -- the loose matching rules are very loose. That's my memory of the problem anyway... I'll have to investigate a bit. The second one refers to parameterized rules, like you find in the YAML spec: [70] l-empty(n,c) ::= ( s-line-prefix(n,c) | s-indent-less-than(n) ) b-as-line-feed When you write the l_empty parser, you want to be able to take two arguments to it, n and c, and then pass those to the subparsers like s_line_prefix. You could do that with Spirit 2 parsers, but not with Spirit X3 parsers. I wrote a YAML parser, and so felt the loss of this pretty strongly. I'm not sure how many people write parsers like this though. The last one you have dealt with yourself -- I think it might be enough to say that Unicode support requires you to repeat a lot of your parser logic.
Though I have attempted X3's Unicode before and I will say, Parser's unified interface is much better. For example, I wrote an entire RFC-compliant URL parser in X3, including Unicode support, for my library Foxy https://github.com/cmazakas/foxy/blob/7f4ac0495ad2ed9cd0eca5994743d677ac1d26...
Yep. I had written Unicode-aware Spirit parsers before, and want to get rid of that kind of repetition.
Not a criticism but Parser's directives are a dramatic improvement over X3. By these, I'm thinking of `boost::parser::string_view[]`, `boost::parser::separate[]` and `boost::parser::merge[]`. These should be highlighted in the comparison vs X3 as I think they add a good bit of substantive value.
Glad you liked these. These came out of discussions with Andrzej about his frustration at having to write rules for some of his parsers. Rather than embrace the complexity of the attribute rules from Spirit, I thought lexically-visible control via directives would be easier to reason about.
More praise for the library includes its parse API which is a dramatic improvement over X3 in terms of ergonomics. Personally, I can see myself main using `prefix_parse`, similarly to X3's original parse() but I think the value of `parse()` itself is obvious and doesn't warrant much explanation.
Thanks! It took me about 6 different iterations to settle on a minimal overload set.
---
The library generates a dramatic amount of warnings when compiled with `-Wall -Wextra -pedantic`. I had originally forgotten to `include_directorie()` Parser as a `SYSTEM` directory and was met with 172 warnings.
I think being clean under `-Wall -Wextra -Werror` is reasonable criteria for acceptance.
I basically never turn on -Wextra, but a I have a ticket to make sure Parser is -Wall-clean: https://github.com/tzlaine/parser/issues/107
When writing the parser for the HTTP version line, I ran into a lot of issues separating the major version from the minor version when using:
parser::digit >> "." >> parser::digit
It was only when I was lazily browsing the docs that I found the `separate[]` directive which I think indicates the library's documentation has poor discoverability.
This is described in the page about attribute generation: https://tzlaine.github.io/parser/doc/html/boost_parser__proposed_/tutorial/a... . It's definitely difficult to know where to put things where people will see them. [snip]
The documentation doesn't exactly give users any guidelines about when they should formally define rules with attributes vs semantic actions.
This is one of those things where if users don't already have the intuition built up, using the library is an exercise in frustration.
Namely, attributes don't compose well when you start nesting rules and want to have different out params at different levels of the parsers. I suppose this should be obvious because the rules themselves form a hierarchy so the attributes _must_ mirror that.
I found that the best thing to do for my use-case was to abuse `_globals(ctx)` and semantic actions.
I think a good rule of thumb for users can be expressed as something like: use rule attributes for simple grammars where all you really want is just the `std::vector<double>`. For something complex with domain-defined validation rules, prefer `_globals(ctx)` and semantic actions for validating everything.
I can add something to the Best Practices page about this. Ticket: https://github.com/tzlaine/parser/issues/110
I felt like I came up with working but unsatisfactory solutions to parsing characters in a specific range.
Namely, I just wrote a simple semantic action around `+parser::char_[f]` but this made me feel like I was writing unidiomatic code. I'd appreciate some feedback here:
// reason-phrase = 1*( HTAB / SP / VCHAR / obs-text ) // VCHAR = %x21-7E auto const on_reason_phrase = [](auto &ctx) { auto ch = _attr(ctx); if (ch != '\t' && ch != ' ' && (ch < '\x21' || ch > '\x7e')) { _pass(ctx) = false; return; }
This part could just have been: char_('\x21', '\x7e') | char_('\t') | char_(' '). You might want to reorder it for the most common stuff to come first. Oh, since you put it in an omit below, it's even simpler: char_('\x21', '\x7e') | '\y' | ' '.
response::metadata &md = _globals(ctx).md_; std::ranges::subrange
sub = _where(ctx); if (!md.reason_phrase_begin_) { md.reason_phrase_begin_ = sub.begin(); } md.reason_phrase_end_ = sub.end(); md.status_line_end_ = sub.end(); }; auto const reason_phrase = parser::omit[+parser::char_[on_reason_phrase]];
---
There should be a debugging utility for discovering what the synthesized attribute is for a rule. Perhaps this could exist as a semantic action that attempts to instantiate an incomplete template.
template<class> struct diagnostic;
auto const debug=[](auto& ctx){diagnostic
d;}; It took me a long time to realize what the attribute type for `digit >> "."
digit` was (it was `std::string`).
Since you are not the only person asking for this: https://github.com/tzlaine/parser/issues/111
The library seems to really emphasize Boost.Hana's `tuple` type but I think in practice, this winds up being a mistake unless there's substantive compile-time savings by using Hana here.
Namely, Hana's `tuple` doesn't support structured bindings. Because Parser is a C++17 library, I consider this a somewhat large defect of the interface. What's worse, Hana's literals aren't compatible with Parser's.
A huge boon for Hana and its `tuple` is that it supports indexing via `tup[0_c]` but this doesn't work when Parser's literal are involved as it also defines a `_c` so an ambiguity is introduced and compilation fails.
Between this and the lack of structured binding support, I think relying on Hana here is a strict downgrade unless there's compelling reasons I'm simply unaware of.
When using Hana, why would you need to use parser::literals? I can only think of uses in the non-Hana compatibility stuff; when using Hana you don't need that stuff, I think.
Stuff like `_globals` in the docs would 404 for me. I also think the docs are kind of just incomplete in general and because there's not much cross-referencing, discoverability is hurt so a user has to essentially read the entire docs front-to-back to understand the library which is a large ask for a parsing library.
I'll audit the links. I did that at some point, but I guess there's been breakage since. Ticket: https://github.com/tzlaine/parser/issues/112
### Conclusion
Overall, I think that even with all of the problems I found in the library, it's still quite good and worthy of being brought into Boost as the successor for Spirit. I don't think anything I found was a deal-breaker and I think in a few iterations of Boost, this library can be something quite great.
Thanks for the review, Christian! Zach