Hi,
This is another review of Boost.Parser.
Usefulness
If Boost.Parser is a maintained version of Boost.Spirit that is already a
worth goal.
If Boost.Parser is a better documented version of Boost.Spirit that is
already a gain. Note that the docs for Boost.Spirit X3 are hidden, and I
was not aware of its existence until the review of Boost.Parser.
I hear that the selling point of this library is Unicode support "through
and through". I cannot confirm that it works as advertised, nor benefit
from this guarantee, as all my use cases for a parser so far only dealt
with the ASCII set.
Design
Boost.Spirit has a great design, and because Boost.Parser adopts the same
design, it must be also good. I would intuitively expect that C++17 offers
sufficient number of new features that the design for a parser library
would be improved over Boost.Spirits. For instance, I am surprised that we
still need to use a macro. Not that I can offer a better alternative. I do
not know much about the implementation difficulties. so maybe the macro is
the best choice. Wouldn't template explicit specializations do the job?
The fact that this library automatically adapts aggregate types as
attributes in a Boost.PFR way is really cool.
I am convinced that compiler error messages could be better. This can be
done by using a combination of static_assert and implementation details
SFINAEd away based on the same condition as in static_assert.
See https://github.com/tzlaine/parser/issues/56
I am unhappy about the fact that as simple task as parsing a JSON-like
array of ints `[1, 2, 3]` into a vector requires me to define a rule.
Spirit introduced operator% that doesn't work for empty lists, but doe to
its clever attribute conversions (at least in Spirit2), I could define:
auto array = '[' > ((int_ % ',') | eps) > ']';
And it would deduce the return attribute alright. But because Boost.Parser
wants to be more strict about clever attribute conversions this option is
gone, and no alternative is provided other than to declare a rule, which I
never needed with Boost.Spirit.
See https://github.com/tzlaine/parser/issues/67
I am not comfortable that I have to use the macro
BOOST_PARSER_DEFINE_RULES. I would expect that template specializations
should give a similar effect. But I do not know how to make it better. But
if I have to use a macro, maybe it could offer more, like checking if the
rule is compatible with its definition (for earlier bug reporting).
I am missing a tool that would give me a compile-time answer
* can I pass this attribute to that parser when parsing this type of
character?
* will my rule with its parser compile for a given input character type?
* will my rule with its parser compile for at least one input character
type?
Documentation
To me, the tutorial and overview in the documentation were sufficient to
understand the library. But this is because I had learned Boost.Spirit
before. If this were my first exposure to "a parser implementation via
expression templates", I would be overwhelmed. Even though a lot of effort
went into preparing a soft introduction to the subject, the subject is so
complicated that it requires a super-gentle introduction. Some things
remain underdocumented, like the interface of context.
See https://github.com/tzlaine/parser/issues/104
After having stared at the docs for more than a month, I still have trouble
finding the table with all the parsers listed.
The reference section of documentation is missing details, such as what
functions throw, what they expect of their arguments, what they guarantee,
what is the exception safety level.
For instance, see https://github.com/tzlaine/parser/issues/103
Also, the layout of the reference section looks like it is driven by an
auto-generation tool run on the header files. It is not very user friendly.
for instance, trying to find the specification of function parse() in
https://tzlaine.github.io/parser/doc/html/header/boost/parser/parser_hpp.htm...
isn't easy: neither visually (the amount of content is overwhelming) nor
with grep-like tool (a lot of things there have "parse" in the name). What
is good for header organization is not necessarily good for documentation
organization.
I am uncomfortable with the responses from the author that the reference
section is missing parts, such as concepts, because the documentation
toolchain does not support them. This would look like a bad choice of the
tool. On the other hand, I do not know of any satisfactory tool for good
C++ documentation.
My testing
I spent like a week trying to compile my parser for a text-based game
engine, which was previously compiled with Boost.Spirit v2.
Here's a sample file to parse (in Polish):
https://raw.githubusercontent.com/akrzemi1/wdungeon/master/wdungeon.plot
Here's the original Spirit V2 parser:
https://raw.githubusercontent.com/akrzemi1/wdungeon/master/parser.cpp
The new parser in BoostParser is included below. I didn't have to use rules
in the original version. I am forced to use them in Boost.Spirit.
I was not interested in performance testing as my use case is to read the
config file upon application startup, so I can afford to wait.
I tried the library with the newest versions of MSVC, GCC and Clang. It
works in all of them. I filed a number of bug reports, to which Zach has
been responding extremely quickly. Thank you! I also appreciate the
negative responses, as they indicate that the author has a clear and
coherent vision of what he wants the library to be.
I recommend accepting the library into Boost.
I would also like to thank Zach for writing and sharing this library!
Regards,
&rzej;
```
namespace parser {
using namespace plot2;
const auto identifier = bp::lexeme[+(bp::digit | bp::lower | bp::upper |
bp::char_('_'))];
const bp::rule
"]") | bp::attr(std::vector<Variable>{});
auto end_tag = bp::lit("[") >> bp::char_("+!") >> bp::lit("]"); auto label_def = bp::lit("[") >> identifier >> bp::lit("]"); auto text_def = bp::lexeme[+(bp::char_ - ']' - '[')]; auto map_step = bp::lit("^") >> -bp::char_("NESW") >> -(bp::lit(".") >> (+bp::char_("STCLPRXHUVD#"))); auto map_def = bp::lit("[") >> +map_step >> bp::lit("]"); auto redirection_def = +(bp::lit("[->]") >> cond >> (-map) >> label >> set_vars); auto option = bp::lit("[#]") >> cond >> text >> (-map) >> label >> (-text) >> set_vars; auto choice_def = bp::lit("[?]") >> text >> (+option); auto continuation_def = redirection | end_tag | choice; auto transition_def = bp::lit("[##]") > label >> -map >> -text > continuation; auto comment_def = bp::lit("[//]") >> bp::omit[*(bp::char_ - ']' - '[')]; auto plot_def = +transition; BOOST_PARSER_DEFINE_RULES(cond, set_vars, choice, continuation, map, label, text, transition, redirection, comment, plot); } ```
On Thu, Feb 22, 2024 at 2:22 PM Andrzej Krzemienski via Boost
Hi, This is another review of Boost.Parser.
[snip]
Design
Boost.Spirit has a great design, and because Boost.Parser adopts the same design, it must be also good. I would intuitively expect that C++17 offers sufficient number of new features that the design for a parser library would be improved over Boost.Spirits. For instance, I am surprised that we still need to use a macro. Not that I can offer a better alternative. I do not know much about the implementation difficulties. so maybe the macro is the best choice. Wouldn't template explicit specializations do the job?
Well, the macro defines two functions per rule. You can't specialize the fucntions without running afoul of language the rules about function specializations not participating in overload resolution. I suppose we could specialize a class template, but that would require the user to copy/paste the two needed overloads into the class. Might as well have them just write out the functions long-hand at that point. I think the macro is the least bad way here. I'm open to alternatives, but I simply don't know of a better one.
The fact that this library automatically adapts aggregate types as attributes in a Boost.PFR way is really cool.
I am convinced that compiler error messages could be better. This can be done by using a combination of static_assert and implementation details SFINAEd away based on the same condition as in static_assert. See https://github.com/tzlaine/parser/issues/56
Andrzej and I have differed about this for a while. I disagree that static_asserts are a better solution than letting constraints just fail, and letting the compiler tell you where the atomic constraint failure is (even though it's a tremendous amount of diagnostics, the answer is actually in there). static_assert won't report deeper than "this static_assert failed" in many cases.
I am unhappy about the fact that as simple task as parsing a JSON-like array of ints `[1, 2, 3]` into a vector requires me to define a rule. Spirit introduced operator% that doesn't work for empty lists, but doe to its clever attribute conversions (at least in Spirit2), I could define:
auto array = '[' > ((int_ % ',') | eps) > ']';
And it would deduce the return attribute alright. But because Boost.Parser wants to be more strict about clever attribute conversions this option is gone, and no alternative is provided other than to declare a rule, which I never needed with Boost.Spirit. See https://github.com/tzlaine/parser/issues/67
Indeed, Spirit has a lot more complicated logic for how attributes are generated. I think it works great in a lot of cases, but I also find that it does things I don't expect pretty often. I prefer a much simpler approach -- one that I can document in a straightforward list of rules in the docs. One downside is that you have to be more explicit sometimes with Parser than with Spirit (meaning, write a rule). I consider this a good trade.
I am not comfortable that I have to use the macro BOOST_PARSER_DEFINE_RULES. I would expect that template specializations should give a similar effect. But I do not know how to make it better. But if I have to use a macro, maybe it could offer more, like checking if the rule is compatible with its definition (for earlier bug reporting).
I don't have a better solution, but am open to one, as I said.
I am missing a tool that would give me a compile-time answer * can I pass this attribute to that parser when parsing this type of character? * will my rule with its parser compile for a given input character type? * will my rule with its parser compile for at least one input character type?
This seems reasonable to me. This would be something like, "attr_t
Documentation
To me, the tutorial and overview in the documentation were sufficient to understand the library. But this is because I had learned Boost.Spirit before. If this were my first exposure to "a parser implementation via expression templates", I would be overwhelmed. Even though a lot of effort went into preparing a soft introduction to the subject, the subject is so complicated that it requires a super-gentle introduction. Some things remain underdocumented, like the interface of context. See https://github.com/tzlaine/parser/issues/104
After having stared at the docs for more than a month, I still have trouble finding the table with all the parsers listed.
I think I should add a Cheat Sheet-style page with all the tables in one place. Honestly, I sometimes have to click around to find it too. Ticket for this: https://github.com/tzlaine/parser/issues/108 .
The reference section of documentation is missing details, such as what functions throw, what they expect of their arguments, what they guarantee, what is the exception safety level. For instance, see https://github.com/tzlaine/parser/issues/103
I'll of course address that ticket, but nothing in Parser throws, except a failed expectation, and that's always caught by the top-level parse function. That's why there's no mention of exception safety.
Also, the layout of the reference section looks like it is driven by an auto-generation tool run on the header files. It is not very user friendly. for instance, trying to find the specification of function parse() in https://tzlaine.github.io/parser/doc/html/header/boost/parser/parser_hpp.htm... isn't easy: neither visually (the amount of content is overwhelming) nor with grep-like tool (a lot of things there have "parse" in the name). What is good for header organization is not necessarily good for documentation organization.
I don't have a great answer here except to say that this is just what Boostbook does, as far as I can tell. My previous libs look like this too.
I am uncomfortable with the responses from the author that the reference section is missing parts, such as concepts, because the documentation toolchain does not support them. This would look like a bad choice of the tool. On the other hand, I do not know of any satisfactory tool for good C++ documentation.
Me either. In particular, I don't know of any tool that knows how to document concepts. I've created a new ticket to address the lack of concept constraints in the API reference: https://github.com/tzlaine/parser/issues/109 .
My testing
I spent like a week trying to compile my parser for a text-based game engine, which was previously compiled with Boost.Spirit v2. Here's a sample file to parse (in Polish): https://raw.githubusercontent.com/akrzemi1/wdungeon/master/wdungeon.plot Here's the original Spirit V2 parser: https://raw.githubusercontent.com/akrzemi1/wdungeon/master/parser.cpp The new parser in BoostParser is included below. I didn't have to use rules in the original version. I am forced to use them in Boost.Spirit.
I was not interested in performance testing as my use case is to read the config file upon application startup, so I can afford to wait.
I tried the library with the newest versions of MSVC, GCC and Clang. It works in all of them. I filed a number of bug reports, to which Zach has been responding extremely quickly. Thank you! I also appreciate the negative responses, as they indicate that the author has a clear and coherent vision of what he wants the library to be.
I recommend accepting the library into Boost. I would also like to thank Zach for writing and sharing this library!
Thanks very much for the review! For those following along, Andrzej has been instrumental in improving the library over the last 2-3 months. Improvement of the docs and quite a bit of the API were driven by his frustrations using earlier versions of Parser. Zach
pt., 23 lut 2024 o 01:55 Zach Laine via Boost
On Thu, Feb 22, 2024 at 2:22 PM Andrzej Krzemienski via Boost
wrote: Hi, This is another review of Boost.Parser.
[snip]
Design
Boost.Spirit has a great design, and because Boost.Parser adopts the same design, it must be also good. I would intuitively expect that C++17 offers sufficient number of new features that the design for a parser library would be improved over Boost.Spirits. For instance, I am surprised that we still need to use a macro. Not that I can offer a better alternative. I do not know much about the implementation difficulties. so maybe the macro is the best choice. Wouldn't template explicit specializations do the job?
Well, the macro defines two functions per rule. You can't specialize the fucntions without running afoul of language the rules about function specializations not participating in overload resolution. I suppose we could specialize a class template, but that would require the user to copy/paste the two needed overloads into the class. Might as well have them just write out the functions long-hand at that point. I think the macro is the least bad way here. I'm open to alternatives, but I simply don't know of a better one.
The fact that this library automatically adapts aggregate types as attributes in a Boost.PFR way is really cool.
I am convinced that compiler error messages could be better. This can be done by using a combination of static_assert and implementation details SFINAEd away based on the same condition as in static_assert. See https://github.com/tzlaine/parser/issues/56
Andrzej and I have differed about this for a while. I disagree that static_asserts are a better solution than letting constraints just fail, and letting the compiler tell you where the atomic constraint failure is (even though it's a tremendous amount of diagnostics, the answer is actually in there). static_assert won't report deeper than "this static_assert failed" in many cases.
It should be noted that excessively huge compiler errors from compiling any template library always stem from the fact that the compiler is giving you all the potential useful information: "All the useful information is there: you just need to scroll through the screenfuls of text to find it". The problem is that there is too much potentially useful information. When I get four screens of illegible signatures, and then the fifth screen gives me concept constraints that failed, but I do not know for which type, because of illegible type signatures, I do not appreciate it. Instead, I would appreciate a two-step debugging of an error message: In the first step, I get a short error message that type T violated concept C. At this point I do not know why C<T> is false. In the next step, I will remove the offending function call and replace it with a "test function" whose only goal is to test if a given type satisfies a given concept. And then I will get another reasonably short message saying which constraint is violated.
I am unhappy about the fact that as simple task as parsing a JSON-like array of ints `[1, 2, 3]` into a vector requires me to define a rule. Spirit introduced operator% that doesn't work for empty lists, but doe to its clever attribute conversions (at least in Spirit2), I could define:
auto array = '[' > ((int_ % ',') | eps) > ']';
And it would deduce the return attribute alright. But because Boost.Parser wants to be more strict about clever attribute conversions this option is gone, and no alternative is provided other than to declare a rule, which I never needed with Boost.Spirit. See https://github.com/tzlaine/parser/issues/67
Indeed, Spirit has a lot more complicated logic for how attributes are generated. I think it works great in a lot of cases, but I also find that it does things I don't expect pretty often. I prefer a much simpler approach -- one that I can document in a straightforward list of rules in the docs. One downside is that you have to be more explicit sometimes with Parser than with Spirit (meaning, write a rule). I consider this a good trade.
I am not comfortable that I have to use the macro BOOST_PARSER_DEFINE_RULES. I would expect that template specializations should give a similar effect. But I do not know how to make it better. But if I have to use a macro, maybe it could offer more, like checking if the rule is compatible with its definition (for earlier bug reporting).
I don't have a better solution, but am open to one, as I said.
I am missing a tool that would give me a compile-time answer * can I pass this attribute to that parser when parsing this type of character? * will my rule with its parser compile for a given input character type? * will my rule with its parser compile for at least one input character type?
This seems reasonable to me. This would be something like, "attr_t
" where R is the range to parse, and P is the parser. The range is needed because the attribute type of char_ differs in the Unicode- and non-Unicode code paths (see https://tzlaine.github.io/parser/doc/html/boost_parser__proposed_/rationale.... ).
But because you have these attribute conversions/assignments, the question that I really need answered is: "If I passed this type A as an attribute to `parse()`, and gave it this input type R and this parser P, would it compile?" So I would expect a signature like compatible, which would be something like `assignable_from>`. Regards, &rzej;.
Documentation
To me, the tutorial and overview in the documentation were sufficient to understand the library. But this is because I had learned Boost.Spirit before. If this were my first exposure to "a parser implementation via expression templates", I would be overwhelmed. Even though a lot of effort went into preparing a soft introduction to the subject, the subject is so complicated that it requires a super-gentle introduction. Some things remain underdocumented, like the interface of context. See https://github.com/tzlaine/parser/issues/104
After having stared at the docs for more than a month, I still have trouble finding the table with all the parsers listed.
I think I should add a Cheat Sheet-style page with all the tables in one place. Honestly, I sometimes have to click around to find it too. Ticket for this: https://github.com/tzlaine/parser/issues/108 .
The reference section of documentation is missing details, such as what functions throw, what they expect of their arguments, what they guarantee, what is the exception safety level. For instance, see https://github.com/tzlaine/parser/issues/103
I'll of course address that ticket, but nothing in Parser throws, except a failed expectation, and that's always caught by the top-level parse function. That's why there's no mention of exception safety.
Also, the layout of the reference section looks like it is driven by an auto-generation tool run on the header files. It is not very user friendly. for instance, trying to find the specification of function parse() in
https://tzlaine.github.io/parser/doc/html/header/boost/parser/parser_hpp.htm...
isn't easy: neither visually (the amount of content is overwhelming) nor with grep-like tool (a lot of things there have "parse" in the name). What is good for header organization is not necessarily good for documentation organization.
I don't have a great answer here except to say that this is just what Boostbook does, as far as I can tell. My previous libs look like this too.
I am uncomfortable with the responses from the author that the reference section is missing parts, such as concepts, because the documentation toolchain does not support them. This would look like a bad choice of the tool. On the other hand, I do not know of any satisfactory tool for good C++ documentation.
Me either. In particular, I don't know of any tool that knows how to document concepts. I've created a new ticket to address the lack of concept constraints in the API reference: https://github.com/tzlaine/parser/issues/109 .
My testing
I spent like a week trying to compile my parser for a text-based game engine, which was previously compiled with Boost.Spirit v2. Here's a sample file to parse (in Polish): https://raw.githubusercontent.com/akrzemi1/wdungeon/master/wdungeon.plot Here's the original Spirit V2 parser: https://raw.githubusercontent.com/akrzemi1/wdungeon/master/parser.cpp The new parser in BoostParser is included below. I didn't have to use rules in the original version. I am forced to use them in Boost.Spirit.
I was not interested in performance testing as my use case is to read the config file upon application startup, so I can afford to wait.
I tried the library with the newest versions of MSVC, GCC and Clang. It works in all of them. I filed a number of bug reports, to which Zach has been responding extremely quickly. Thank you! I also appreciate the negative responses, as they indicate that the author has a clear and coherent vision of what he wants the library to be.
I recommend accepting the library into Boost. I would also like to thank Zach for writing and sharing this library!
Thanks very much for the review!
For those following along, Andrzej has been instrumental in improving the library over the last 2-3 months. Improvement of the docs and quite a bit of the API were driven by his frustrations using earlier versions of Parser.
Zach
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Fri, Feb 23, 2024 at 2:05 AM Andrzej Krzemienski
[snip]
It should be noted that excessively huge compiler errors from compiling any template library always stem from the fact that the compiler is giving you all the potential useful information: "All the useful information is there: you just need to scroll through the screenfuls of text to find it".
The problem is that there is too much potentially useful information. When I get four screens of illegible signatures, and then the fifth screen gives me concept constraints that failed, but I do not know for which type, because of illegible type signatures, I do not appreciate it.
Instead, I would appreciate a two-step debugging of an error message: In the first step, I get a short error message that type T violated concept C. At this point I do not know why C<T> is false. In the next step, I will remove the offending function call and replace it with a "test function" whose only goal is to test if a given type satisfies a given concept. And then I will get another reasonably short message saying which constraint is violated.
It is my fervent hope that concept-related diagnostics get better in the coming years. However, I think even now, when they suck so hard, I'd rather have the info there to dig through than have to stop and write some code just to dig through a subset of it there. Zach
participants (2)
-
Andrzej Krzemienski
-
Zach Laine