Boost.Parser questions

Ruben Perez

27 Feb 2024 27 Feb '24

4:40 p.m.

Hi all, Sorry for jumping into this so late. I'll send my review tomorrow (that's still on time, isn't it?). I've been playing with the library and some questions popped up (please note I've no prior experience with Spirit): * What's the rationale for using operator[] for semantic actions (instead of a regular method like .action())? It looks like the parsing problem with inline lambdas could be easily worked around with this. * What's the rationale for using operator[] for directives? repeat(24)[p] looks like an over-complication, where a regular function repeat(24, p) would have just worked fine. * For non-recursive rules, what's the rationale on preferring the macro over something like: constexpr bp::rule<struct replacement_field_tag, std::string> replacement_field { "replacement_field", +bp::char_ }; * My use case involves parsing identifiers that can only contain ASCII lowercase, uppercase, digits and the underscore. I'd usually do this by defining a `bool is_identifier_char(char c)` function. This is what I could do with Boost.Parser - is there a better way (one that doesn't force me to list all valid characters)? constexpr const char name_chars[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_0123456789"; constexpr auto name_continuation_rng = std::span<const char>( name_continuation, sizeof(name_continuation) - 1 ); * What's the rationale behind the special naming convention for functions like _attr, _value, _globals and the like? Is there a reason why these can't be member functions of the context? * I've found really frustrating the generic lambda approach for semantic actions, since this completely inhibits tooling like clangd. Is there a reason why we can't have a public definition of the parsing context I could use in my functions? * I started my evaluation using clang-18 (Linux) but builds fail (seems to happen under all clang versions I tried). gcc-12 seems to work fine. Note that I'm using the develop branch, since I understand it contains the bug fixes for the issues that previous reviewers found. Regards, Ruben.

Show replies by date

Christian Mazakas

27 Feb 27 Feb

10:23 p.m.

...

* My use case involves parsing identifiers that can only contain ASCII lowercase, uppercase, digits and the underscore.

Spirit used to have helpers like this but Parser doesn't seem to have them. I noticed this too but it's actually pretty easy to fill this in yourself. Here's a working example: https://godbolt.org/z/6P6dTbGYY auto const digit = p::char_('0', '9'); auto const lower = p::char_('a', 'z'); auto const upper = p::char_('A', 'Z'); auto const ident = digit | lower | upper | '_';

...

* I started my evaluation using clang-18 (Linux) but builds fail (seems to happen under all clang versions I tried).

Hmm, I can't actually can't replicate this, btw. I'm using the latest tip of develop here: https://github.com/cmazakas/parser-review/blob/main/overlays/boost-parser/po... Compiles fine for me using clang-17 on Ubuntu 23.10 You should include details about the nature of your build failures. - Christian

Christian Mazakas

10:25 p.m.

And for what it's worth, I'm using clangd as well. You can get better intellisense by doing like this: metadata& md = _globals(ctx); std::string_view sv = _attr(ctx); This isn't the most ideal because I'd rather use type deduction here but overall, I can't complain too badly. - Christian

Zach Laine

11:44 p.m.

On Tue, Feb 27, 2024 at 4:24 PM Christian Mazakas via Boost <boost@lists.boost.org> wrote:

...

...
* My use case involves parsing identifiers that can only contain ASCII lowercase, uppercase, digits and the underscore.

Spirit used to have helpers like this but Parser doesn't seem to have them. I noticed this too but it's actually pretty easy to fill this in yourself.

Here's a working example: https://godbolt.org/z/6P6dTbGYY

auto const digit = p::char_('0', '9'); auto const lower = p::char_('a', 'z'); auto const upper = p::char_('A', 'Z'); auto const ident = digit | lower | upper | '_';

Parser does have these (digit, lower, upper), but those match more than what is desired here. What is desired here is alnum | char_('_'), I think. That is, only the ASCII a-z, A-Z, 0-9, and _. You can spell that out yourself as above, as you've done. You could also just use digit | lower | upper | char_('_'). It will be vaguely as fast I expect (but certainly measure if it's a perf-critical situation).

...

...
* I started my evaluation using clang-18 (Linux) but builds fail (seems to happen under all clang versions I tried).

Hmm, I can't actually can't replicate this, btw.

I'm using the latest tip of develop here: https://github.com/cmazakas/parser-review/blob/main/overlays/boost-parser/po...

Compiles fine for me using clang-17 on Ubuntu 23.10

You should include details about the nature of your build failures.

Yes, please do! I'd like to know about and fix this, but I have limited access to Clang. Zach

Zach Laine

11:48 p.m.

On Tue, Feb 27, 2024 at 5:44 PM Zach Laine <whatwasthataddress@gmail.com> wrote:

...

On Tue, Feb 27, 2024 at 4:24 PM Christian Mazakas via Boost <boost@lists.boost.org> wrote:

...
...
* My use case involves parsing identifiers that can only contain ASCII lowercase, uppercase, digits and the underscore.

Spirit used to have helpers like this but Parser doesn't seem to have them. I noticed this too but it's actually pretty easy to fill this in yourself.

Here's a working example: https://godbolt.org/z/6P6dTbGYY

auto const digit = p::char_('0', '9'); auto const lower = p::char_('a', 'z'); auto const upper = p::char_('A', 'Z'); auto const ident = digit | lower | upper | '_';

Parser does have these (digit, lower, upper), but those match more than what is desired here. What is desired here is alnum | char_('_'), I think. That is, only the ASCII a-z, A-Z, 0-9, and _. You can spell that out yourself as above, as you've done. You could also just use digit | lower | upper | char_('_'). It will be vaguely as fast I expect (but certainly measure if it's a perf-critical situation).

I should have mentioned -- I recently removed the ascii::* parsers, which used is_*() from the C standard library. It included ascii::alnum. I removed them because those is_*() functions are considered just plain wrong by me and lots of other people from SG-16 (the committee's Unicode study group). They are also technically dangerous, though most standard libraries I know of patch around the potential UB because it is so easy to fall afoul of. I don't know if you're using one of the big three std libs though, so it seems sketchy to use those, just for safety reasons. They also have wrong semantics in a Unicode context. Zach

Richard

28 Feb 28 Feb

9:26 p.m.

In article <CALOpkJCSm3-YEfhYVuWZG+XPco09LfOodope37QjOn40qGGQ_Q@mail.gmail.com> you write:

...

I should have mentioned -- I recently removed the ascii::* parsers, which used is_*() from the C standard library. It included ascii::alnum. I removed them because those is_*() functions are considered just plain wrong by me and lots of other people from SG-16 (the committee's Unicode study group).

Is there a link to some discussion about why they are considered 'wrong'? Certainly they're correct for ASCII. -- "The Direct3D Graphics Pipeline" free book <http://tinyurl.com/d3d-pipeline> The Terminals Wiki <http://terminals-wiki.org> The Computer Graphics Museum <http://computergraphicsmuseum.org> Legalize Adulthood! (my blog) <http://legalizeadulthood.wordpress.com>

Zach Laine

29 Feb 29 Feb

12:30 a.m.

On Wed, Feb 28, 2024 at 3:26 PM Richard via Boost <boost@lists.boost.org> wrote:

...

In article <CALOpkJCSm3-YEfhYVuWZG+XPco09LfOodope37QjOn40qGGQ_Q@mail.gmail.com> you write:

...
I should have mentioned -- I recently removed the ascii::* parsers, which used is_*() from the C standard library. It included ascii::alnum. I removed them because those is_*() functions are considered just plain wrong by me and lots of other people from SG-16 (the committee's Unicode study group).

Is there a link to some discussion about why they are considered 'wrong'? Certainly they're correct for ASCII.

Yeah, but if you switch to using Unicode, you have some problems. https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend For the lazy: the values returned by these functions depends on the locale. The locale is global state that can be set by anyone, at any time, and is not concurrency-safe. You can set it, call isalnum(), and get a locale that is not the one you just set. They're a mess, as is absolutely everything that depends on locale. FWIW, there are even locales for EBCDIC that make even your ASCII-range values wrong. Zach

Vinnie Falco

1:29 p.m.

On Wed, Feb 28, 2024 at 4:30 PM Zach Laine via Boost <boost@lists.boost.org> wrote:

...

For the lazy: the values returned by these functions depends on the locale.

This is a recurring issue for my libraries which follow the Internet RFCs. That's one of the reasons why Boost.URL defines its own charsets: https://github.com/boostorg/url/blob/bbbef97a5b30cd6d11e0c0ad5994a70a136e35c... Thanks

Darrell Wright

28 Feb 28 Feb

12:03 a.m.

When testing I noticed that <algorithm> was missing app/raw.githubusercontent.com/tzlaine/parser/single_header/include/boost/parser/parser_unified.hpp:13853:17: error: no matching function for call to 'find' 13853 | std::find(detail::eol_cps.begin(), detail::eol_cps.end(), c) != | Darrell

...

On Feb 27, 2024, at 17:23, Christian Mazakas via Boost <boost@lists.boost.org> wrote:

...
* My use case involves parsing identifiers that can only contain ASCII lowercase, uppercase, digits and the underscore.

Spirit used to have helpers like this but Parser doesn't seem to have them. I noticed this too but it's actually pretty easy to fill this in yourself.

Here's a working example: https://godbolt.org/z/6P6dTbGYY

auto const digit = p::char_('0', '9'); auto const lower = p::char_('a', 'z'); auto const upper = p::char_('A', 'Z'); auto const ident = digit | lower | upper | '_';

...
* I started my evaluation using clang-18 (Linux) but builds fail (seems to happen under all clang versions I tried).

Hmm, I can't actually can't replicate this, btw.

I'm using the latest tip of develop here: https://github.com/cmazakas/parser-review/blob/main/overlays/boost-parser/po...

Compiles fine for me using clang-17 on Ubuntu 23.10

You should include details about the nature of your build failures.

- Christian

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Zach Laine

12:10 a.m.

On Tue, Feb 27, 2024 at 6:07 PM Darrell Wright via Boost <boost@lists.boost.org> wrote:

...

When testing I noticed that <algorithm> was missing app/raw.githubusercontent.com/tzlaine/parser/single_header/include/boost/parser/parser_unified.hpp:13853:17: error: no matching function for call to 'find' 13853 | std::find(detail::eol_cps.begin(), detail::eol_cps.end(), c) != |

Ah, I see. Don't use that mono-header. Something about its generation seems to have gotten borked; I'm not sure what. It's unreliable at the moment, which is why I took it off the README on master. It's a pretty low priority to fix it. Zach

499

Age (days ago)

501

Last active (days ago)

List overview

Download

9 comments

6 participants

participants (6)

Christian Mazakas
Darrell Wright
Richard
Ruben Perez
Vinnie Falco
Zach Laine