Late review of Boost.Parser

newer
Re: [boost] Reminder: Review for...

David Sankel

29 Feb 2024 29 Feb '24

10:43 p.m.

# Are you knowledgeable about the problem domain? Yes. I've written several parser combinator libraries in C++ and used still more in C++ (Spirit v1, v2, and qi) and other programming langauges (Haskell's Parsec, attoparsec, and others). # How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? I read through the tutorial documentation and some of the reference documentation. I also tried to create several examples with the library and made comparisons to Haskell's Parsec. # What is your evaluation of the potential usefulness of the library? Very # What is your evaluation of the documentation? The documentation isn't great. Consider char_ <https://tzlaine.github.io/parser/doc/html/boost/parser/char_.html>'s documentation. It includes no examples and fails to mention it can be used without arguments. It seems strange that the tutorial focuses so much on semantic actions. It seems like the more common case would be to parse into some kind of abstract syntax tree. # What is your evaluation of the design? Much of the wealth of research into parser combinators wasn't incorporated into this library. I didn't see in the documentation, for example, an easy way to "map" one parser's attribute into another. This kind of thing should be a basis operation provided by the library. # What is your evaluation of the implementation? The usage of has_include is a ODR nightmare for large codebases. It would be better to generate some kind of config.hpp that sets these definitions at compile time. For custom variant/tuple types, user specialization of variable templates is also an ODR nightmare. Again, a config.hpp would be better for this. For this, though, I don't think the configurability is worth the complexity. We have standard types for these so we should use them. # Did you try to use the library? With what compiler? Did you have any problems? Yes, I tried to use the library with a modern compiler on a MacOS machine. I didn't have any problems building. # Do you think the library should be accepted as a Boost library? No, I do not think it should be accepted. Overall, I think the syntax has too much focus on magically filling in user-provided structures instead of the basics of monadic parser combinators and basis operations. Consider the following Haskell parser which evaluates simple parenthesized sum expressions. I wasn't able to use Boost.Parser to accomplish this after reading the documentation and several attempts. I'm sure it's possible, but I don't see how it can be done using the combiners and primitives provided. import Text.Parsec.String (Parser)import Text.Parsec integer :: Parser Integerinteger = do n <- many1 digit return (read n) integerPlus :: Parser IntegerintegerPlus = do x <- integer y <- (try $ char '+' >> expression) <|> return 0 return $ x+y parentheses :: Parser Integerparentheses = do char '(' x <- expression char ')' return x expression :: Parser Integerexpression = integerPlus <|> parentheses

Show replies by date

Zach Laine

1 Mar 1 Mar

1:17 a.m.

On Thu, Feb 29, 2024 at 4:44 PM David Sankel via Boost <boost@lists.boost.org> wrote:

...

# Are you knowledgeable about the problem domain?

Yes. I've written several parser combinator libraries in C++ and used still more in C++ (Spirit v1, v2, and qi) and other programming langauges (Haskell's Parsec, attoparsec, and others).

# How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?

I read through the tutorial documentation and some of the reference documentation. I also tried to create several examples with the library and made comparisons to Haskell's Parsec.

# What is your evaluation of the potential usefulness of the library?

Very

# What is your evaluation of the documentation?

The documentation isn't great. Consider char_ <https://tzlaine.github.io/parser/doc/html/boost/parser/char_.html>'s documentation. It includes no examples and fails to mention it can be used without arguments.

I can certainly mention that in that reference entry. Ticket: https://github.com/tzlaine/parser/issues/150 FWIW, char_ is used without arguments *all over* the docs. It would be hard to miss it. [snip]

...

# What is your evaluation of the design?

Much of the wealth of research into parser combinators wasn't incorporated into this library. I didn't see in the documentation, for example, an easy way to "map" one parser's attribute into another. This kind of thing should be a basis operation provided by the library.

Could you explain what you mean by this? This happens implicitly usually, and you can explicitly do it in semantic actions. Do you mean something else? An implicit example would be: struct s {int i, double d; }; std::vector<s> s_vec; bool success = bp::parse("...", *(bp::int_ >> double_), bp::ws, s_vec); The sequence parser on the inside is feeding its attributes into the repeat-parser on the outside.

...

# What is your evaluation of the implementation?

The usage of has_include is a ODR nightmare for large codebases. It would be better to generate some kind of config.hpp that sets these definitions at compile time.

This is a great point. There are three places where this currently happens: 1) Boost.TypeIndex vs typeinfo (for printing type names in the trace) 2) BOOST_ASSERT vs. assert 3) Spirit X3 vs. charconv vs. Boost.Charconv 1 and 3 are easily addressed by adding an extra template parameter to the few function templates that use those APIs, so a mismatch will be a build break, not ODR. 2 does not apply to production code, and any use of BOOST_ASSERT already has the same problem anyway. Ticket: https://github.com/tzlaine/parser/issues/151

...

For custom variant/tuple types, user specialization of variable templates is also an ODR nightmare. Again, a config.hpp would be better for this. For this, though, I don't think the configurability is worth the complexity. We have standard types for these so we should use them.

Huh? There's no ODR here. I don't use one type if the specialization exists and another if it does not. The issue is that if you feed me an attribute out-arg, I might not treat it as what it is unless you tell me that it's variant-like, or optional-like, or whatever. For instance, Andrzej was trying to use boost::variant in one of his parsers, and the printing code broke. It broke because boost::variant defines ostream & op<<(ostream&,variant), and then *static asserts in it*. What? Anyway, without the printing code knowing that it's a variant type, it doesn't know to skip printing it (I don't visit variants during printing, I just print <<variant>> or whatever). [snip]

...

Overall, I think the syntax has too much focus on magically filling in user-provided structures instead of the basics of monadic parser combinators and basis operations.

Sure, this parser lib focuses on attribute generation more than any one other feature, to be sure. That's part of its legacy -- Spirit 2 and Spirit X3 have the same focus, and I'm a fan.

...

Consider the following Haskell parser which evaluates simple parenthesized sum expressions. I wasn't able to use Boost.Parser to accomplish this after reading the documentation and several attempts. I'm sure it's possible, but I don't see how it can be done using the combiners and primitives provided.

import Text.Parsec.String (Parser)import Text.Parsec integer :: Parser Integerinteger = do n <- many1 digit return (read n) integerPlus :: Parser IntegerintegerPlus = do x <- integer y <- (try $ char '+' >> expression) <|> return 0 return $ x+y parentheses :: Parser Integerparentheses = do char '(' x <- expression char ')' return x expression :: Parser Integerexpression = integerPlus <|> parentheses

No idea what any of that means. Thanks for reviewing! Zach

Peter Dimov

1:35 a.m.

Zach Laine wrote:

...

For instance, Andrzej was trying to use boost::variant in one of his parsers, and the printing code broke. It broke because boost::variant defines ostream & op<<(ostream&,variant), and then *static asserts in it*. What?

This probably needs to be fixed in Variant. I had the same problem https://github.com/boostorg/variant2/issues/31 and fixed it.

David Sankel

3:44 p.m.

On Thu, Feb 29, 2024 at 8:18 PM Zach Laine via Boost <boost@lists.boost.org> wrote:

...

On Thu, Feb 29, 2024 at 4:44 PM David Sankel via Boost <boost@lists.boost.org> wrote: Consider char_

...
<https://tzlaine.github.io/parser/doc/html/boost/parser/char_.html>'s documentation. It includes no examples and fails to mention it can be used without arguments.

I can certainly mention that in that reference entry. Ticket: https://github.com/tzlaine/parser/issues/150

Great, thanks!

...

...
# What is your evaluation of the design?

Much of the wealth of research into parser combinators wasn't incorporated into this library. I didn't see in the documentation, for example, an easy way to "map" one parser's attribute into another. This kind of thing should be a basis operation provided by the library.

Could you explain what you mean by this? This happens implicitly usually, and you can explicitly do it in semantic actions. Do you mean something else?

An implicit example would be:

struct s {int i, double d; }; std::vector<s> s_vec; bool success = bp::parse("...", *(bp::int_ >> double_), bp::ws, s_vec);

The sequence parser on the inside is feeding its attributes into the repeat-parser on the outside.

If a parser p has attribute type T (a T parser), and you have a function f of type U(T), I would expect there to be an easy way to convert the T parser into a U parser. Something like map(f, p).

...

# What is your evaluation of the implementation?

...
The usage of has_include is a ODR nightmare for large codebases. It would be better to generate some kind of config.hpp that sets these definitions at compile time.

This is a great point. Ticket: https://github.com/tzlaine/parser/issues/151

Thanks!

...

...
For custom variant/tuple types, user specialization of variable templates is also an ODR nightmare. Again, a config.hpp would be better for this. For this, though, I don't think the configurability is worth the complexity. We have standard types for these so we should use them.

Huh? There's no ODR here.

Agreed, you can strike that comment.

...

Overall, I think the syntax has too much focus on magically filling in

...
user-provided structures instead of the basics of monadic parser combinators and basis operations.

Sure, this parser lib focuses on attribute generation more than any one other feature, to be sure. That's part of its legacy -- Spirit 2 and Spirit X3 have the same focus, and I'm a fan.

Parser combinator libraries have progressed a lot since then and, having used both styles extensively, I'm much more eager to pick up one of the more modern designs. I would expect a Boost library to reflect the state of the art not only in language features used, but also in API design for the domain.

...

...
Consider the following Haskell parser which evaluates simple parenthesized sum expressions. I wasn't able to use Boost.Parser to accomplish this after reading the documentation and several attempts. I'm sure it's possible, but I don't see how it can be done using the combiners and primitives provided.

import Text.Parsec.String (Parser)import Text.Parsec integer :: Parser Integerinteger = do n <- many1 digit return (read n) integerPlus :: Parser IntegerintegerPlus = do x <- integer y <- (try $ char '+' >> expression) <|> return 0 return $ x+y parentheses :: Parser Integerparentheses = do char '(' x <- expression char ')' return x expression :: Parser Integerexpression = integerPlus <|> parentheses

No idea what any of that means.

Above is a parser that parses expressions like "23+(42+1)+7". The resulting attribute is an evaluation of the expression (e.g. 66). It looks like the above code was garbled in formatting, but you can see it here on godbolt ( https://godbolt.org/z/s4x78bPK8). It is using Parsec, a simple monadic parser editor combinator library written in Haskell. I highly suggest folks writing parser generators read up on it as its authors did an impressive job figuring out the essential operations and building something both easy and expressive. Thanks for reviewing!

...

Sure thing!

Christian Mazakas

5:04 p.m.

...

Parser combinator libraries have progressed a lot since then and, having used both styles extensively, I'm much more eager to pick up one of the more modern designs. I would expect a Boost library to reflect the state of the art not only in language features used, but also in API design for the domain.

Do you have any C++ examples where this is the state of the art? Do you have any examples of current literature? Declaring monadic parser combinators to be the state of the art when Haskell is mostly relegated to academic programs is kind of ostentatious, for lack of a better term. The thing about Haskell is that it's had time to gain ground but it just hasn't because people aren't interested in it. I don't mind exploring the design space of transforming parsers but I don't think there's anything particularly outdated about Parser's approach. If anything, I think the way you compose parser_interfaces in this library actually would be the natural transformation required to form the monad in the first place. - Christian

David Sankel

5:38 p.m.

On Fri, Mar 1, 2024 at 12:04 PM Christian Mazakas via Boost < boost@lists.boost.org> wrote:

...

...
Parser combinator libraries have progressed a lot since then and, having used both styles extensively, I'm much more eager to pick up one of the more modern designs. I would expect a Boost library to reflect the state of the art not only in language features used, but also in API design for the domain.

Do you have any C++ examples where this is the state of the art?

Google monadic parser combinator C++ and you'll find plenty.

...

Do you have any examples of current literature?

I'd suggest starting with the 2001 parsec paper and looking to where it was cited by later research on scholar.google.com.

...

Declaring monadic parser combinators to be the state of the art when Haskell is mostly relegated to academic programs is kind of ostentatious, for lack of a better term.

Seriously? Industry languages borrow state of the art from research languages all the time. The thing about Haskell is that it's had time to gain ground but it just

...

hasn't because people aren't interested in it.

This isn't relevant.

...

I don't mind exploring the design space of transforming parsers but I don't think there's anything particularly outdated about Parser's approach.

Great, that's exactly what I'm encouraging. Explore the design space!

...

If anything, I think the way you compose parser_interfaces in this library actually would be the natural transformation required to form the monad in the first place.

Which aspects of "the way you compose" form the monad?

David Sankel

6:02 p.m.

On Fri, Mar 1, 2024 at 12:38 PM David Sankel <camior@gmail.com> wrote:

...

On Fri, Mar 1, 2024 at 12:04 PM Christian Mazakas via Boost < boost@lists.boost.org> wrote:

...
...
Parser combinator libraries have progressed a lot since then and, having used both styles extensively, I'm much more eager to pick up one of the more modern designs. I would expect a Boost library to reflect the state of the art not only in language features used, but also in API design for the domain.

The thing about Haskell is that it's had time to gain ground but it just

...
hasn't because people aren't interested in it.

This isn't relevant.

There's one more thing I'd like to point out. A GitHub code search for "#include <boost/spirit" in C++ files results in 31.5k files. Another search for "import Text.Parsec" in Haskell files results in 17.5k files. This is telling given C++ developers outnumber Haskell developers 11 to 1. - StackOverflow developer survey data on language poplarity ( https://insights.stackoverflow.com/survey) - GitHub code search for Boost.Spirit ( https://github.com/search?type=code&auto_enroll=true&q=%22%23include+%3Cboost%2Fspirit%22+%28path%3A*.cpp+OR+path%3A*.h+OR+path%3A*.hpp%29 ) - GitHub code search for Parsec ( https://github.com/search?type=code&auto_enroll=true&q=%22import+Text.Parsec%22+path%3A*.hs )

...

Zach Laine

2 Mar 2 Mar

2:57 a.m.

On Fri, Mar 1, 2024 at 12:03 PM David Sankel via Boost <boost@lists.boost.org> wrote:

...

On Fri, Mar 1, 2024 at 12:38 PM David Sankel <camior@gmail.com> wrote:

...
On Fri, Mar 1, 2024 at 12:04 PM Christian Mazakas via Boost < boost@lists.boost.org> wrote:

...
...
Parser combinator libraries have progressed a lot since then and, having used both styles extensively, I'm much more eager to pick up one of the more modern designs. I would expect a Boost library to reflect the state of the art not only in language features used, but also in API design for the domain.

The thing about Haskell is that it's had time to gain ground but it just

...
hasn't because people aren't interested in it.

This isn't relevant.

There's one more thing I'd like to point out. A GitHub code search for "#include <boost/spirit" in C++ files results in 31.5k files. Another search for "import Text.Parsec" in Haskell files results in 17.5k files. This is telling given C++ developers outnumber Haskell developers 11 to 1.

- StackOverflow developer survey data on language poplarity ( https://insights.stackoverflow.com/survey) - GitHub code search for Boost.Spirit ( https://github.com/search?type=code&auto_enroll=true&q=%22%23include+%3Cboost%2Fspirit%22+%28path%3A*.cpp+OR+path%3A*.h+OR+path%3A*.hpp%29 ) - GitHub code search for Parsec ( https://github.com/search?type=code&auto_enroll=true&q=%22import+Text.Parsec%22+path%3A*.hs )

A lot of C++ programmers use Spirit. A lot of Haskell programmers use Parsec. Even if it's way more that use Parsec proportionally, so what? I'm not sure what I'm supposed to take away from this other than, "Different languages have different modalities." This doesn't speak to the issue at hand -- which one's essential design makes a better C++ (and in particular, Boost) library? Zach

Дмитрий Архипов

1 Mar 1 Mar

7:16 p.m.

...

...
Much of the wealth of research into parser combinators wasn't incorporated into this library. I didn't see in the documentation, for example, an easy way to "map" one parser's attribute into another. This kind of thing should be a basis operation provided by the library.

Could you explain what you mean by this? This happens implicitly usually, and you can explicitly do it in semantic actions. Do you mean something else?

I actually wanted to ask for a similar thing, but then forgot about it. Basically, sometimes you have to create a rule simply because you want to convert the attribute from type A to type B, and you use a semantic action. Semantic actions result in parser's attribute becoming none. But if instead semantic action's result type would be the parser's attribute, then using an action would become a mapping operation. E.g. (+bp::digit)[( [](auto& attr){ return std::stol(attr); } )]; // mapping string -> long If the library would also unwrap tuples in attributes, you could even do things like (double_ >> double_)[ std::plus<>() ]

Zach Laine

2 Mar 2 Mar

3:01 a.m.

On Fri, Mar 1, 2024 at 1:16 PM Дмитрий Архипов via Boost <boost@lists.boost.org> wrote:

...

...
...
Much of the wealth of research into parser combinators wasn't incorporated into this library. I didn't see in the documentation, for example, an easy way to "map" one parser's attribute into another. This kind of thing should be a basis operation provided by the library.

Could you explain what you mean by this? This happens implicitly usually, and you can explicitly do it in semantic actions. Do you mean something else?

I actually wanted to ask for a similar thing, but then forgot about it. Basically, sometimes you have to create a rule simply because you want to convert the attribute from type A to type B, and you use a semantic action. Semantic actions result in parser's attribute becoming none. But if instead semantic action's result type would be the parser's attribute, then using an action would become a mapping operation. E.g.

(+bp::digit)[( [](auto& attr){ return std::stol(attr); } )]; // mapping string -> long

If the library would also unwrap tuples in attributes, you could even do things like

(double_ >> double_)[ std::plus<>() ]

Hopefully I answered this elsewhere in this thread, or partially in https://github.com/tzlaine/parser/issues/106 . There's an implementation for that already, but I'm certain to change it. As-is, it's too hard to use, because too often your lambas need constraining. Anyway, if those two sources don't answer this, just let me know. Zach

490

Age (days ago)

492

Last active (days ago)

List overview

Download

9 comments

5 participants

participants (5)

Christian Mazakas
David Sankel
Peter Dimov
Zach Laine
Дмитрий Архипов