Hi all,
Sorry for jumping into this so late. I'll send my review tomorrow
(that's still on time, isn't it?). I've been playing with the library
and some questions popped up (please note I've no prior experience
with Spirit):
* What's the rationale for using operator[] for semantic actions
(instead of a regular method like .action())? It looks like the
parsing problem with inline lambdas could be easily worked around with
this.
* What's the rationale for using operator[] for directives?
repeat(24)[p] looks like an over-complication, where a regular
function repeat(24, p) would have just worked fine.
* For non-recursive rules, what's the rationale on preferring the
macro over something like:
constexpr bp::rule
* My use case involves parsing identifiers that can only contain ASCII lowercase, uppercase, digits and the underscore.
Spirit used to have helpers like this but Parser doesn't seem to have them. I noticed this too but it's actually pretty easy to fill this in yourself. Here's a working example: https://godbolt.org/z/6P6dTbGYY auto const digit = p::char_('0', '9'); auto const lower = p::char_('a', 'z'); auto const upper = p::char_('A', 'Z'); auto const ident = digit | lower | upper | '_';
* I started my evaluation using clang-18 (Linux) but builds fail (seems to happen under all clang versions I tried).
Hmm, I can't actually can't replicate this, btw. I'm using the latest tip of develop here: https://github.com/cmazakas/parser-review/blob/main/overlays/boost-parser/po... Compiles fine for me using clang-17 on Ubuntu 23.10 You should include details about the nature of your build failures. - Christian
And for what it's worth, I'm using clangd as well. You can get better intellisense by doing like this: metadata& md = _globals(ctx); std::string_view sv = _attr(ctx); This isn't the most ideal because I'd rather use type deduction here but overall, I can't complain too badly. - Christian
On Tue, Feb 27, 2024 at 4:24 PM Christian Mazakas via Boost
* My use case involves parsing identifiers that can only contain ASCII lowercase, uppercase, digits and the underscore.
Spirit used to have helpers like this but Parser doesn't seem to have them. I noticed this too but it's actually pretty easy to fill this in yourself.
Here's a working example: https://godbolt.org/z/6P6dTbGYY
auto const digit = p::char_('0', '9'); auto const lower = p::char_('a', 'z'); auto const upper = p::char_('A', 'Z'); auto const ident = digit | lower | upper | '_';
Parser does have these (digit, lower, upper), but those match more than what is desired here. What is desired here is alnum | char_('_'), I think. That is, only the ASCII a-z, A-Z, 0-9, and _. You can spell that out yourself as above, as you've done. You could also just use digit | lower | upper | char_('_'). It will be vaguely as fast I expect (but certainly measure if it's a perf-critical situation).
* I started my evaluation using clang-18 (Linux) but builds fail (seems to happen under all clang versions I tried).
Hmm, I can't actually can't replicate this, btw.
I'm using the latest tip of develop here: https://github.com/cmazakas/parser-review/blob/main/overlays/boost-parser/po...
Compiles fine for me using clang-17 on Ubuntu 23.10
You should include details about the nature of your build failures.
Yes, please do! I'd like to know about and fix this, but I have limited access to Clang. Zach
On Tue, Feb 27, 2024 at 5:44 PM Zach Laine
On Tue, Feb 27, 2024 at 4:24 PM Christian Mazakas via Boost
wrote: * My use case involves parsing identifiers that can only contain ASCII lowercase, uppercase, digits and the underscore.
Spirit used to have helpers like this but Parser doesn't seem to have them. I noticed this too but it's actually pretty easy to fill this in yourself.
Here's a working example: https://godbolt.org/z/6P6dTbGYY
auto const digit = p::char_('0', '9'); auto const lower = p::char_('a', 'z'); auto const upper = p::char_('A', 'Z'); auto const ident = digit | lower | upper | '_';
Parser does have these (digit, lower, upper), but those match more than what is desired here. What is desired here is alnum | char_('_'), I think. That is, only the ASCII a-z, A-Z, 0-9, and _. You can spell that out yourself as above, as you've done. You could also just use digit | lower | upper | char_('_'). It will be vaguely as fast I expect (but certainly measure if it's a perf-critical situation).
I should have mentioned -- I recently removed the ascii::* parsers, which used is_*() from the C standard library. It included ascii::alnum. I removed them because those is_*() functions are considered just plain wrong by me and lots of other people from SG-16 (the committee's Unicode study group). They are also technically dangerous, though most standard libraries I know of patch around the potential UB because it is so easy to fall afoul of. I don't know if you're using one of the big three std libs though, so it seems sketchy to use those, just for safety reasons. They also have wrong semantics in a Unicode context. Zach
In article
I should have mentioned -- I recently removed the ascii::* parsers, which used is_*() from the C standard library. It included ascii::alnum. I removed them because those is_*() functions are considered just plain wrong by me and lots of other people from SG-16 (the committee's Unicode study group).
Is there a link to some discussion about why they are considered 'wrong'? Certainly they're correct for ASCII. -- "The Direct3D Graphics Pipeline" free book http://tinyurl.com/d3d-pipeline The Terminals Wiki http://terminals-wiki.org The Computer Graphics Museum http://computergraphicsmuseum.org Legalize Adulthood! (my blog) http://legalizeadulthood.wordpress.com
On Wed, Feb 28, 2024 at 3:26 PM Richard via Boost
In article
you write: I should have mentioned -- I recently removed the ascii::* parsers, which used is_*() from the C standard library. It included ascii::alnum. I removed them because those is_*() functions are considered just plain wrong by me and lots of other people from SG-16 (the committee's Unicode study group).
Is there a link to some discussion about why they are considered 'wrong'? Certainly they're correct for ASCII.
Yeah, but if you switch to using Unicode, you have some problems. https://daniel.haxx.se/blog/2018/01/30/isalnum-is-not-my-friend For the lazy: the values returned by these functions depends on the locale. The locale is global state that can be set by anyone, at any time, and is not concurrency-safe. You can set it, call isalnum(), and get a locale that is not the one you just set. They're a mess, as is absolutely everything that depends on locale. FWIW, there are even locales for EBCDIC that make even your ASCII-range values wrong. Zach
On Wed, Feb 28, 2024 at 4:30 PM Zach Laine via Boost
For the lazy: the values returned by these functions depends on the locale.
This is a recurring issue for my libraries which follow the Internet RFCs. That's one of the reasons why Boost.URL defines its own charsets: https://github.com/boostorg/url/blob/bbbef97a5b30cd6d11e0c0ad5994a70a136e35c... Thanks
When testing I noticed that <algorithm> was missing app/raw.githubusercontent.com/tzlaine/parser/single_header/include/boost/parser/parser_unified.hpp:13853:17: error: no matching function for call to 'find' 13853 | std::find(detail::eol_cps.begin(), detail::eol_cps.end(), c) != | Darrell
On Feb 27, 2024, at 17:23, Christian Mazakas via Boost
wrote: * My use case involves parsing identifiers that can only contain ASCII lowercase, uppercase, digits and the underscore.
Spirit used to have helpers like this but Parser doesn't seem to have them. I noticed this too but it's actually pretty easy to fill this in yourself.
Here's a working example: https://godbolt.org/z/6P6dTbGYY
auto const digit = p::char_('0', '9'); auto const lower = p::char_('a', 'z'); auto const upper = p::char_('A', 'Z'); auto const ident = digit | lower | upper | '_';
* I started my evaluation using clang-18 (Linux) but builds fail (seems to happen under all clang versions I tried).
Hmm, I can't actually can't replicate this, btw.
I'm using the latest tip of develop here: https://github.com/cmazakas/parser-review/blob/main/overlays/boost-parser/po...
Compiles fine for me using clang-17 on Ubuntu 23.10
You should include details about the nature of your build failures.
- Christian
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Tue, Feb 27, 2024 at 6:07 PM Darrell Wright via Boost
When testing I noticed that <algorithm> was missing app/raw.githubusercontent.com/tzlaine/parser/single_header/include/boost/parser/parser_unified.hpp:13853:17: error: no matching function for call to 'find' 13853 | std::find(detail::eol_cps.begin(), detail::eol_cps.end(), c) != |
Ah, I see. Don't use that mono-header. Something about its generation seems to have gotten borked; I'm not sure what. It's unreliable at the moment, which is why I took it off the README on master. It's a pretty low priority to fix it. Zach
participants (6)
-
Christian Mazakas
-
Darrell Wright
-
Richard
-
Ruben Perez
-
Vinnie Falco
-
Zach Laine