Boost.Parser Review I'd like to thank Zach for his work. Here's my review.
Does this library bring real benefit to C++ developers for real-world use-case?
Yes. Boost.Spirit is already a great library that exemplifies what's possible in C++. But it's pre-C++11 and no longer in active development. So even if this were only a modern and actively maintained version of Spirit, that's already good enough considering how important the problem is. It seems like many people have issues with Spirit. For instance, we avoided using it in Boost.URL and know of other people who have also been avoiding it in their projects. Some things I like about Boost.Parser: - Boost.Parser uses the same conventions as Spirit to combine parsers. - Being able to use C++17 vocabulary types in the API - Unicode support - Better error reporting - The Boost.Spirit authors gave it their blessing - The library name is very direct The issue with Spirit people seem to mention most frequently for avoiding it is compile times. Unfortunately, there's not a lot that can be done by Boost.Parser here, but users can still mitigate that by only including the library in their source files if they want.
Do you have an application for this library?
We have lots of libraries where we need to parse things. I would consider Boost.Parse for new libraries once it's more stable. Even if we intend to use custom parsers in the end, Boost.Parser could provide a good starting point.
Does the API match with current best practices?
Using the same conventions as Spirit to combine parsers seems like a good
design here.
I was a little confused about the convention to assume unicode for anything
other than char.
I'm not sure that's the convention that minimizes surprise.
I would expect the same convention (unicode or not) to apply to any data
type because:
- It's one more thing to remember.
- I often don't want to interpret things as code points regardless of
whether it's char or unsigned char.
- I noticed many people also don't care about unicode most of the time in
their use cases.
- It also seems to be the case that the relevant keywords in most grammars
are ASCII.
- The documentation says "the Unicode parsing API will not actually cost
you anything" but two sentences later it says "one caveat is that there may
be an extra branch on each char, if the input is UTF-8"
At least, the library documentation could make it very clear right at the
start how the user can explicitly specify if they want to enable/disable
unicode support for whatever data type they wish.
I find the macros to define rules a little unsettling.
- Isn't it possible to at least document what things would look like
without a macro so that readers can have a good idea of what the macro is
doing whenever they use it?
- A single macro to define the rules could also be useful.
- It would also be useful if the cases where macros are not as useful
(non-recursive rules?) could be specified in the documentation.
About Hana:
- I don't like optional dependencies in general, so I also don't like the
idea of supporting std::tuple and hana tuples separately.
- Since the library is boost/standalone and tuple/hana, then that's already
four combinations that are unlikely to be tested for all relevant compilers
and environments.
- That's more possibilities of conflicts for users compiling different
versions of the library.
- I especially don't like the idea of using another type for tuples when we
already have a vocabulary type from the standard.
- I feel like whatever Hana provides can be achieved with std::tuple and
maybe some other library like mp11 in extreme cases.
- It's probably possible to support Hana as a peer-dependency where the
rule would fill any tuple-like object according to some concept (the rules
can already do that for containers, variants and optionals if I'm not
wrong).
- Including all of
Is the documentation helpful and clear?
Yes. The documentation is clear. It's driven by goals and examples, which is great. Knowing Boost.Spirit helped me, so it might be useful to give more weight to feedback from people who don't know Boost.Spirit yet. I found the section on "Writing Your Own Parsers" very limited considering that we know "some people are obsessed with writing everything for themselves". Very often, we have ad hoc rules that are hard to express as a combination of other rules. At other times, as a library evolves, custom rules can make things less generic and more efficient even if they could be expressed with the more generic rules. For instance, many examples in the documentation would be more easily represented as functions in my opinion. So I find this insufficient: "It is a basic orientation to get you familiar enough with all the moving parts of writing a parser that you can then learn by reading the Boost.Parser code" because the documentation should be exposing all the concepts and documenting everything. Telling such a usual user to read the code to reverse engineer how it should be done is no good. Especially because the number of concepts and requirements involved is probably not much larger than what's already documented. Are there small examples of binary parsers somewhere? The reference is just kind of ugly and it's difficult to find individual symbols. The difference between this library and Spirit versions should be more explicit and justified in the documentation. Some pros and cons only apply to certain versions of Spirit, so that should also be explicit instead of just mentioning Spirit 2 _or_ Spirit X3. I think some other people were also confused about that. The section "The Parse Context" is a little confusing. Unlike other sections, it's not goal-oriented with lots of examples. The section "Alternative Parsers" is a little weird. I don't see a reason to discuss operator|() individually in a single section and not discuss other operators and the previous or next sections. I would expect a section listing all operators (as exists later) and operator|() to just be part of it. I find the TOC overwhelming (reminds me of the Boost.Asio documentation a little). There are too many sections under "Tutorial". Almost the whole thing is under "Tutorial", which makes categorization somewhat pointless. It would be good to use Miller's Law at each level here so that the reader can always know where he is. This whole "Tutorial" section doesn't even look much like a tutorial. One thing I'm also missing is some cheat sheets with all the symbols provided by the library related to what a given section is discussing.
Did you try to use it? What problems or surprises did you encounter?
Yes. It worked fine with MSVC over here but CI needs to improve because people have reported failures with other compiler versions. It's not a reason to reject the library but it's something that needs to be fixed and causes a bad first impression. CI should also include coverage, more warnings, sanitizers, and probably fuzz tests. I believe Sam can also help with Drone if that's needed.
What is your evaluation of the implementation?
The code is very well organized and properly documented. I found some issues in the CMake scripts but I'll just open issues in the repository for these.
Are you knowledgeable about the problem domain?
We have been using custom parsers in almost all of our libraries. I've also used Boost.Spirit before. In most cases, we just write custom logic for the parsers now. But something like Boost.Parser could save us a lot of time in new libraries even if we intend to use custom parsers later.
Please explicitly state that you either *accept* or *reject* the inclusion of this library into boost.
Also please indicate the time & effort spent on the evaluation and give
I recommend accepting the library. the reasons for your decision. I spent time on and off over the last weeks and this week. In total, I spent something like 2 or 3 days evaluating the library.
Disclaimer
I'm affiliated with the C++ Alliance. -- Alan Freitas