On Fri, Dec 29, 2023 at 10:35 AM Peter Dimov via Boost
Zach Laine wrote: ...
I'm calling my proposal Boost.Parser, and it follows many of the conventions of Boost.Spirit 2 and X3, such as the operators used for overloading, the names of many parsers and directives, etc. It requires C++17 or later. ...
The Github page is here: https://github.com/tzlaine/parser The online docs are here: https://tzlaine.github.io/parser
Some observations:
I understand, in principle, the motivation behind asserting at runtime instead of failing compilation, but I don't think the same argument applies to rejecting *eps parsers. It seems to me that a static assert for any *p or +p where p can match epsilon (can succeed while consuming no input) would be clear enough. (E.g. +-p, *(p | q | eps), *attr(...), +&p, etc.)
Why? It may be better to static_assert, but it's not clear to me why
Interestingly, this would reject **p and +*p, because these parsers can go into an infinite loop. The current behavior is to collapse them into *p, which is useful, but technically wrong. This raises the possibility of, instead of rejecting *p or +p when p can match epsilon, just 'fixing' its behavior so that when p matches epsilon, the outer parser just exits the loop. This will make the current collapsing behavior equivalent to the non-collapsed one.
At first, I thought this was a great idea. Now I'm ambivalent. The way I might implement this is in repeat_parser (that's the only looping parser, modulo its subclasses). I could then do a couple of things: 1) detect that we have not eaten any of the input, but have matched repeat_parser's subparser, and terminate the repetition; or 2) detect that we have matched repeat_parser's subparser, *and* that the subparser is an unconditional match. #1 is nice, because you don't need any way of tagging parser types as being epsilon-like. Without this or some similar approach you could end up with a closed set of types that trigger this short-circuiting. This seems like a maintenance problem for me, but moreover an extensibility problem for users. #2 suffers from this closed-set problem. To fix #2, I could add a template param (or constexpr static member, same diff), that acts as a tag. #1 is problematic though, and anything where the no-input-consuming match is conditional is equally problematic. Each parser could have arbitrary side effects, via semantic actions. So this parser: *(if_(c)[p] | eps[a]) Could match the eps first, if 'c' evaluated to false, and later match 'p', depending on what 'a' does. If 'a' flips the value of 'c', then the parse will always match 'p'. If 'a' increments a counter, then the parse might eventually match 'p', but just take a long time to do it; this case might also result in an infinite loop. In the case of the increment that ends in a match, maybe 'a' increments a counter, but also does some other important side effect. This may be a useful pattern to someone, somewhere. This is obviously contrived, but the point is that there are currently some things that you can express that would become non-expressible. tl;dr I like the idea, but I'm struggling with how to do it so that we don't limit expressivity.
Also, errors should definitely go to std::cerr by default, not std::cout. Errors aren't program output, and routing them to stdout is script-hostile.
Ach! Yeah, that's just an oversight. I've opened a ticket, thanks. Zach