On 7/11/2018 11:01, Michael Powell wrote:
I've got a couple of rules that are perplexing to me. First,
rule
id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")]; In and of itself, id is working fine. Then I've got a "full id":
rule
full_id %= id >> *(char_('.') >> id); Where:
struct full_id_t { std::string val; };
full_id_t::val is quite intentional for reasons elsewhere in the grammar.
The perplexity comes in, it seems lexeme is only shaving off the first word as the val.
For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
Again, I don't really know anything about Spirit, but it's reasonable to assume that "lexeme" will group its input sequence into a single token output, which is the result of id as a single std::string. Meanwhile in full_id you're specifying a sequence of input tokens, so it will also output a sequence of tokens (which can presumably be captured as a std::vectorstd::string, not simply a std::string). Most likely (though again this is just a guess) given the input "two.oranges.red.test" you should end up with std::vectorstd::string { "two", "oranges", "red", "test" }. This is probably what you want (as it will simplify later use of subcomponents), especially if the language allows whitespace around the ".". If you want to disallow whitespace around the "." and get it as a single string token, then yes, you will probably have to make full_id call lexeme. I don't know whether that will require extracting the inner part of id to a separate rule so that lexeme only ends up being called once or if you can "nest" uses of lexeme.