[Spirit] '_' parsed as '\0' with qi::phrase_parse
Hi,
The underscore is parsed as a null character. Is this expected. A bug?
If a full repro is needed I can create one.
qi::rule
A literal b in a parser expression created `qi::lit(b)`. `qi::lit` does not "have" (synthesize/expose) an attribute. You want either `qi::alpha | qi::char_("._")`, or `qi::char_("a-zA-Z0-9_.")` or even (as you're in a lexeme anyways) `raw[alpha >> *(alnum|'.'|'_')]` because for `raw[]` the synthesized attribute is the source iterator range. First fix: http://coliru.stacked-crooked.com/a/c5d9ee594370ac91 Seeing that your `identifier` is a lexeme, you should just drop the skipper (see https://stackoverflow.com/a/17073965/85371). In the absense of semantic actions `%=` is redundant. I don't recommend using classic spirit features in 2022. All in all: https://godbolt.org/z/r6MPvK5aK On Mon, Jul 18, 2022, at 9:24 AM, Olaf van der Spek via Boost wrote:
Hi,
The underscore is parsed as a null character. Is this expected. A bug?
If a full repro is needed I can create one.
qi::rule
identifier; identifier %= lexeme[alpha >> *(alnum | '.' | '_')]; typedef boost::spirit::classic::position_iterator
iterator_t; std::string s = "a_b blah"; iterator_t is(s.c_str(), s.c_str() + s.size(), "<name>");
std::string out; bool res = qi::phrase_parse(is, iterator_t(), identifier, skip, out); if (!res && is != iterator_t()) error_handler_::err(is); cout << out.size() << "|" << out << "|\n"; cout << int(out[0]) << " " << int(out[1]) << " " << int(out[2]) << "\n";
Output: // 3|ab| // 97 0 98
Greetings, -- Olaf
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Mon, Jul 18, 2022 at 11:01 PM Seth via Boost
A literal b in a parser expression created `qi::lit(b)`. `qi::lit` does not "have" (synthesize/expose) an attribute. You want either `qi::alpha | qi::char_("._")`, or `qi::char_("a-zA-Z0-9_.")` or even (as you're in a lexeme anyways) `raw[alpha >> *(alnum|'.'|'_')]` because for `raw[]` the synthesized attribute is the source iterator range.
First fix: http://coliru.stacked-crooked.com/a/c5d9ee594370ac91
Seeing that your `identifier` is a lexeme, you should just drop the skipper (see https://stackoverflow.com/a/17073965/85371).
In the absense of semantic actions `%=` is redundant.
I don't recommend using classic spirit features in 2022.
All in all: https://godbolt.org/z/r6MPvK5aK
Hi Seth, Thanks! It's old, untouched code, hence the use of classic. I still don't get why it's returning "a\0b" though, where does the \0 come from? The code must've worked in the past, was the behavior of classic spirit changed? Or did I rely on undefined behavior? Your code parses multiple identifiers instead of just one, is that on purpose? Greetings, Olaf
On Mon, Jul 18, 2022, at 9:24 AM, Olaf van der Spek via Boost wrote:
Hi,
The underscore is parsed as a null character. Is this expected. A bug?
If a full repro is needed I can create one.
qi::rule
identifier; identifier %= lexeme[alpha >> *(alnum | '.' | '_')]; typedef boost::spirit::classic::position_iterator
iterator_t; std::string s = "a_b blah"; iterator_t is(s.c_str(), s.c_str() + s.size(), "<name>");
std::string out; bool res = qi::phrase_parse(is, iterator_t(), identifier, skip, out); if (!res && is != iterator_t()) error_handler_::err(is); cout << out.size() << "|" << out << "|\n"; cout << int(out[0]) << " " << int(out[1]) << " " << int(out[2]) << "\n";
Output: // 3|ab| // 97 0 98
Greetings, -- Olaf
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
-- Olaf
participants (2)
-
Olaf van der Spek
-
Seth