[Spirit] Qi lexeme only taking the first word
Hello,
I've got a couple of rules that are perplexing to me. First,
rule
On 7/11/2018 11:01, Michael Powell wrote:
I've got a couple of rules that are perplexing to me. First,
rule
id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")]; In and of itself, id is working fine. Then I've got a "full id":
rule
full_id %= id >> *(char_('.') >> id); Where:
struct full_id_t { std::string val; };
full_id_t::val is quite intentional for reasons elsewhere in the grammar.
The perplexity comes in, it seems lexeme is only shaving off the first word as the val.
For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
Again, I don't really know anything about Spirit, but it's reasonable to assume that "lexeme" will group its input sequence into a single token output, which is the result of id as a single std::string. Meanwhile in full_id you're specifying a sequence of input tokens, so it will also output a sequence of tokens (which can presumably be captured as a std::vectorstd::string, not simply a std::string). Most likely (though again this is just a guess) given the input "two.oranges.red.test" you should end up with std::vectorstd::string { "two", "oranges", "red", "test" }. This is probably what you want (as it will simplify later use of subcomponents), especially if the language allows whitespace around the ".". If you want to disallow whitespace around the "." and get it as a single string token, then yes, you will probably have to make full_id call lexeme. I don't know whether that will require extracting the inner part of id to a separate rule so that lexeme only ends up being called once or if you can "nest" uses of lexeme.
On Tue, Nov 6, 2018 at 5:01 PM Michael Powell
Hello,
I've got a couple of rules that are perplexing to me. First,
rule
id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")]; In and of itself, id is working fine. Then I've got a "full id":
rule
full_id %= id >> *(char_('.') >> id); Where:
struct full_id_t { std::string val; };
full_id_t::val is quite intentional for reasons elsewhere in the grammar.
The perplexity comes in, it seems lexeme is only shaving off the first word as the val.
For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
Perhaps I should defer specifying the lexeme part of id until later?
I elaborated a little on the "simple" full id sub-grammar, but I cannot repro using the GCC compiler. I'm wondering if this has anything to do with the VS2017 fpos issue? http://coliru.stacked-crooked.com/a/adeb42ce2f19b0fd Or there may be insufficient context in the web compiler to adequately demo.
Thoughts? Suggestions?
Thank you!
Best regards,
Michael Powell
On Tue, Nov 6, 2018 at 5:40 PM Michael Powell
On Tue, Nov 6, 2018 at 5:01 PM Michael Powell
wrote: Hello,
I've got a couple of rules that are perplexing to me. First,
rule
id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")]; In and of itself, id is working fine. Then I've got a "full id":
rule
full_id %= id >> *(char_('.') >> id); Where:
struct full_id_t { std::string val; };
full_id_t::val is quite intentional for reasons elsewhere in the grammar.
The perplexity comes in, it seems lexeme is only shaving off the first word as the val.
For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
Perhaps I should defer specifying the lexeme part of id until later?
I elaborated a little on the "simple" full id sub-grammar, but I cannot repro using the GCC compiler. I'm wondering if this has anything to do with the VS2017 fpos issue?
http://coliru.stacked-crooked.com/a/adeb42ce2f19b0fd
Or there may be insufficient context in the web compiler to adequately demo.
I got a repro: http://coliru.stacked-crooked.com/a/069a44296240be7e Although the reasons as to why I do not know. It is a difference in attribute synthesis. When full_id synthesizes a std::string(), the conversion to full_id_t() "just works" magically. I'm guessing by happy accident based on the std::string val being the only member (adaptation, etc). But when I change the synthesis to be its "true" type, that is, AST::full_id_t(), suddenly I see the same behavior. Really and truly, I do not know why. Everything else being equal why would one approach be any different than the other? Anyone with some Spirit, Fusion, AST, insights? Thanks! For now, I'll run with it as has been exposed here, but it's a bit troubling to me not knowing the difference.
Thoughts? Suggestions?
Thank you!
Best regards,
Michael Powell
It's been a long while since I've used spirit::qi. But What it looks like is happeneing in your setup is something liek this,
When you have:
qi::rule
On Tue, Nov 6, 2018 at 5:01 PM Michael Powell
wrote: Hello,
I've got a couple of rules that are perplexing to me. First,
rule
id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")]; In and of itself, id is working fine. Then I've got a "full id":
rule
full_id %= id >> *(char_('.') >> id); Where:
struct full_id_t { std::string val; };
full_id_t::val is quite intentional for reasons elsewhere in the grammar.
The perplexity comes in, it seems lexeme is only shaving off the first word as the val.
For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
Perhaps I should defer specifying the lexeme part of id until later?
I elaborated a little on the "simple" full id sub-grammar, but I cannot repro using the GCC compiler. I'm wondering if this has anything to do with the VS2017 fpos issue?
http://coliru.stacked-crooked.com/a/adeb42ce2f19b0fd
Or there may be insufficient context in the web compiler to adequately demo.
I got a repro: http://coliru.stacked-crooked.com/a/069a44296240be7e Although the reasons as to why I do not know. It is a difference in attribute synthesis. When full_id synthesizes a std::string(), the conversion to full_id_t() "just works" magically. I'm guessing by happy accident based on the std::string val being the only member (adaptation, etc). But when I change the synthesis to be its "true" type, that is, AST::full_id_t(), suddenly I see the same behavior. Really and truly, I do not know why. Everything else being equal why would one approach be any different than the other? Anyone with some Spirit, Fusion, AST, insights? Thanks! For now, I'll run with it as has been exposed here, but it's a bit troubling to me not knowing the difference.
Thoughts? Suggestions?
Thank you!
Best regards,
Michael Powell
Boost-users mailing list Boost-users@lists.boost.org https://lists.boost.org/mailman/listinfo.cgi/boost-users
On Tue, Nov 6, 2018 at 8:12 PM rmawatson rmawatson
It's been a long while since I've used spirit::qi. But What it looks like is happeneing in your setup is something liek this,
When you have:
qi::rule
full_id; the attribute is vector<string>
When it matches
id >> *(char_('.') >> id)
this has an attribute of vector
>> or something similar.
Where are you getting that from? It makes no sense whatsoever given the struct full_it_t { std::string val; }, which is similarly mapped, and ruled, etc.
spirit appears to compare your target attribute with the synthesised attribute of the parser and for any (trailing?) members of the synthesised attribute that do not match in your attribute, it marks them as unused_type and they are not assigned.
Would I need to do some grouping or something to persuade Spirit to treat the struct as I've defined and adapted it?
You can see overload of assign to is used in your example if you breakpoint it -> boost\spirit\home\qi\detail\assign_to.hpp line 399.
It appears in boost\spirit\home\qi\operator\sequence_base.hpp line 74, where the predicate traits::attribute_not_unused
is passed to spirit::any_if (boost\spirit\home\support\algorithm\any_if.hpp line 186.) it will basically discard attributes where the LHS sequence is not matched with the RHS. You can see this in your example by adding an additional member to
struct full_id_t { std::string val; std::vectorstd::string others; };
BOOST_FUSION_ADAPT_STRUCT(AST::full_id_t, val, others)
Your missing bits will appear in this std::vector, as they are now not silently discarded. http://coliru.stacked-crooked.com/a/51f16c6deff45309
I think what the problem fundamentally is the attribute propagation is different when you have a string to when you have a vector<string> as in your two examples. the first kicks in whatever logic exists to flatten the LHS attribute into a string, the second takes the first element, assigns it and marks the rest as unused.
One thing you can do is use qi::asstd::string()[ id >> *(char_('.') >> id) ] to force conversion of synthesised attribute to a string to happen before it is assigned to your attribute. http://coliru.stacked-crooked.com/a/6a060343a390f037
I've only had a quick look and this is pretty half hearted analysis. You'll really have to dig deep to find out exactly what is going on, but I suspect this is somewhat along the right lines. ________________________________ From: Boost-users
on behalf of Michael Powell via Boost-users Sent: 06 November 2018 23:03 To: boost-users@lists.boost.org Cc: Michael Powell Subject: Re: [Boost-users] [Spirit] Qi lexeme only taking the first word On Tue, Nov 6, 2018 at 5:40 PM Michael Powell
wrote: On Tue, Nov 6, 2018 at 5:01 PM Michael Powell
wrote: Hello,
I've got a couple of rules that are perplexing to me. First,
rule
id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")]; In and of itself, id is working fine. Then I've got a "full id":
rule
full_id %= id >> *(char_('.') >> id); Where:
struct full_id_t { std::string val; };
full_id_t::val is quite intentional for reasons elsewhere in the grammar.
The perplexity comes in, it seems lexeme is only shaving off the first word as the val.
For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
Perhaps I should defer specifying the lexeme part of id until later?
I elaborated a little on the "simple" full id sub-grammar, but I cannot repro using the GCC compiler. I'm wondering if this has anything to do with the VS2017 fpos issue?
http://coliru.stacked-crooked.com/a/adeb42ce2f19b0fd
Or there may be insufficient context in the web compiler to adequately demo.
I got a repro:
http://coliru.stacked-crooked.com/a/069a44296240be7e
Although the reasons as to why I do not know.
It is a difference in attribute synthesis. When full_id synthesizes a std::string(), the conversion to full_id_t() "just works" magically. I'm guessing by happy accident based on the std::string val being the only member (adaptation, etc).
But when I change the synthesis to be its "true" type, that is, AST::full_id_t(), suddenly I see the same behavior.
Really and truly, I do not know why. Everything else being equal why would one approach be any different than the other?
Anyone with some Spirit, Fusion, AST, insights?
Thanks!
For now, I'll run with it as has been exposed here, but it's a bit troubling to me not knowing the difference.
Thoughts? Suggestions?
Thank you!
Best regards,
Michael Powell
Boost-users mailing list Boost-users@lists.boost.org https://lists.boost.org/mailman/listinfo.cgi/boost-users
On 7/11/2018 15:08, Michael Powell wrote:
When it matches
id >> *(char_('.') >> id)
this has an attribute of vector
>> or something similar. Where are you getting that from? It makes no sense whatsoever given the struct full_it_t { std::string val; }, which is similarly mapped, and ruled, etc.
This might be wrong, but it's how I read the docs:
The output of parsing is a Fusion sequence of the attributes that were
parsed.
So the output of
id >> *(char_('.') >> id)
is something like (but not exactly)
tuple<string>
tuple
On Tue, Nov 6, 2018 at 10:28 PM Gavin Lambert via Boost-users
On 7/11/2018 15:08, Michael Powell wrote:
When it matches
id >> *(char_('.') >> id)
this has an attribute of vector
>> or something similar. Where are you getting that from? It makes no sense whatsoever given the struct full_it_t { std::string val; }, which is similarly mapped, and ruled, etc.
This might be wrong, but it's how I read the docs:
The output of parsing is a Fusion sequence of the attributes that were parsed.
So the output of
id >> *(char_('.') >> id)
is something like (but not exactly)
tuple<string> tuple
tuple etc string because that's the output attribute declared for id. char because you've used char_ instead of using '.' by itself (otherwise it would just disappear). And the latter two can be repeated zero or more times because you've used *.
When you assign this to a rule with %=, it tries to best-fit this against the rule's declared output attribute.
full_id_t contains a single string field, so the Fusion adaptation makes it equivalent to tuple<string>, and apparently this results in any additional values being discarded, not in concatenating as you expect.
You can probably use an explicit semantic action to build a single string instead of using %=.
Or you can make full_id_t contain vector<string> as rmawatson and I previously suggested, which should give you all the values.
Another possibility, which I can't test because coliru appears to be grumpy at present, is to try using:
full_id %= as_string[lexeme[id >> *(char_('.') >> id)]];
This approach works for me. And remains true to the AST. +1 Thanks!
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org https://lists.boost.org/mailman/listinfo.cgi/boost-users
On Tue, Nov 6, 2018 at 11:46 PM Michael Powell
On Tue, Nov 6, 2018 at 10:28 PM Gavin Lambert via Boost-users
wrote: On 7/11/2018 15:08, Michael Powell wrote:
When it matches
id >> *(char_('.') >> id)
this has an attribute of vector
>> or something similar. Where are you getting that from? It makes no sense whatsoever given the struct full_it_t { std::string val; }, which is similarly mapped, and ruled, etc.
This might be wrong, but it's how I read the docs:
The output of parsing is a Fusion sequence of the attributes that were parsed.
So the output of
id >> *(char_('.') >> id)
is something like (but not exactly)
tuple<string> tuple
tuple etc string because that's the output attribute declared for id. char because you've used char_ instead of using '.' by itself (otherwise it would just disappear). And the latter two can be repeated zero or more times because you've used *.
When you assign this to a rule with %=, it tries to best-fit this against the rule's declared output attribute.
full_id_t contains a single string field, so the Fusion adaptation makes it equivalent to tuple<string>, and apparently this results in any additional values being discarded, not in concatenating as you expect.
You can probably use an explicit semantic action to build a single string instead of using %=.
Or you can make full_id_t contain vector<string> as rmawatson and I previously suggested, which should give you all the values.
Another possibility, which I can't test because coliru appears to be grumpy at present, is to try using:
full_id %= as_string[lexeme[id >> *(char_('.') >> id)]];
This approach works for me. And remains true to the AST. +1 Thanks!
Boy, wow... I'll qualify that with this: in "this" case I was able to persuade Spirit/Fusion to produce what I wanted. In other cases, not so much. It really, I mean **REALLY**, wants to produce that std::vector<...>, doesn't it? It will take a bit of digesting to adjust the AST, etc, to that, but it's good (no, GREAT) to know about.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org https://lists.boost.org/mailman/listinfo.cgi/boost-users
On 7/11/2018 16:28, I wrote:
Another possibility, which I can't test because coliru appears to be grumpy at present, is to try using:
full_id %= as_string[lexeme[id >> *(char_('.') >> id)]];
Actually, since you're consuming a consecutive sequence of input characters without skipping any whitespace, you could probably use this instead, which might be faster (though that's just a guess; measure it!): full_id %= as_string[raw[id >> *('.' >> id)]]; (I was half expecting as_string to not be needed here, but apparently it still is.)
this has an attribute of vector
>> or something similar.
Where are you getting that from? It makes no sense whatsoever given
the struct full_it_t { std::string val; }, which is similarly mapped,
and ruled,
I've just had a look and the synthesized attribute is actually
boost::fusion::vector
It's been a long while since I've used spirit::qi. But What it looks like is happeneing in your setup is something liek this,
When you have:
qi::rule
full_id; the attribute is vector<string>
When it matches
id >> *(char_('.') >> id)
this has an attribute of vector
>> or something similar.
Where are you getting that from? It makes no sense whatsoever given the struct full_it_t { std::string val; }, which is similarly mapped, and ruled, etc.
spirit appears to compare your target attribute with the synthesised attribute of the parser and for any (trailing?) members of the synthesised attribute that do not match in your attribute, it marks them as unused_type and they are not assigned.
Would I need to do some grouping or something to persuade Spirit to treat the struct as I've defined and adapted it?
You can see overload of assign to is used in your example if you breakpoint it -> boost\spirit\home\qi\detail\assign_to.hpp line 399.
It appears in boost\spirit\home\qi\operator\sequence_base.hpp line 74, where the predicate traits::attribute_not_unused
is passed to spirit::any_if (boost\spirit\home\support\algorithm\any_if.hpp line 186.) it will basically discard attributes where the LHS sequence is not matched with the RHS. You can see this in your example by adding an additional member to
struct full_id_t { std::string val; std::vectorstd::string others; };
BOOST_FUSION_ADAPT_STRUCT(AST::full_id_t, val, others)
Your missing bits will appear in this std::vector, as they are now not silently discarded. http://coliru.stacked-crooked.com/a/51f16c6deff45309
I think what the problem fundamentally is the attribute propagation is different when you have a string to when you have a vector<string> as in your two examples. the first kicks in whatever logic exists to flatten the LHS attribute into a string, the second takes the first element, assigns it and marks the rest as unused.
One thing you can do is use qi::asstd::string()[ id >> *(char_('.') >> id) ] to force conversion of synthesised attribute to a string to happen before it is assigned to your attribute. http://coliru.stacked-crooked.com/a/6a060343a390f037
I've only had a quick look and this is pretty half hearted analysis. You'll really have to dig deep to find out exactly what is going on, but I suspect this is somewhat along the right lines. ________________________________ From: Boost-users
on behalf of Michael Powell via Boost-users Sent: 06 November 2018 23:03 To: boost-users@lists.boost.org Cc: Michael Powell Subject: Re: [Boost-users] [Spirit] Qi lexeme only taking the first word On Tue, Nov 6, 2018 at 5:40 PM Michael Powell
wrote: On Tue, Nov 6, 2018 at 5:01 PM Michael Powell
wrote: Hello,
I've got a couple of rules that are perplexing to me. First,
rule
id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")]; In and of itself, id is working fine. Then I've got a "full id":
rule
full_id %= id >> *(char_('.') >> id); Where:
struct full_id_t { std::string val; };
full_id_t::val is quite intentional for reasons elsewhere in the grammar.
The perplexity comes in, it seems lexeme is only shaving off the first word as the val.
For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
Perhaps I should defer specifying the lexeme part of id until later?
I elaborated a little on the "simple" full id sub-grammar, but I cannot repro using the GCC compiler. I'm wondering if this has anything to do with the VS2017 fpos issue?
http://coliru.stacked-crooked.com/a/adeb42ce2f19b0fd
Or there may be insufficient context in the web compiler to adequately demo.
I got a repro:
http://coliru.stacked-crooked.com/a/069a44296240be7e
Although the reasons as to why I do not know.
It is a difference in attribute synthesis. When full_id synthesizes a std::string(), the conversion to full_id_t() "just works" magically. I'm guessing by happy accident based on the std::string val being the only member (adaptation, etc).
But when I change the synthesis to be its "true" type, that is, AST::full_id_t(), suddenly I see the same behavior.
Really and truly, I do not know why. Everything else being equal why would one approach be any different than the other?
Anyone with some Spirit, Fusion, AST, insights?
Thanks!
For now, I'll run with it as has been exposed here, but it's a bit troubling to me not knowing the difference.
Thoughts? Suggestions?
Thank you!
Best regards,
Michael Powell
Boost-users mailing list Boost-users@lists.boost.org https://lists.boost.org/mailman/listinfo.cgi/boost-users
Boost-users mailing list Boost-users@lists.boost.org https://lists.boost.org/mailman/listinfo.cgi/boost-users
On 11/6/18 7:12 PM, rmawatson rmawatson via Boost-users wrote:
It's been a long while since I've used spirit::qi. But What it looks like is happeneing in your setup is something liek this,
When you have:
qi::rule
full_id; the attribute is vector<string>
When it matches
id >> *(char_('.') >> id)
this has an attribute of vector
>> or something similar. [snip]
One thing you can do is use qi::asstd::string()[ id >> *(char_('.') >> id) ] to force conversion of synthesised attribute to a string to happen before it is assigned to your attribute. [snip]
rmawatson's asstd::string suggestion works: https://coliru.stacked-crooked.com/a/a2c9435ee9e88bad Yeah rmawatson!
On 11/6/18 4:40 PM, Michael Powell via Boost-users wrote:
On Tue, Nov 6, 2018 at 5:01 PM Michael Powell
wrote: Hello,
I've got a couple of rules that are perplexing to me. First,
rule
id %= lexeme[qi::alpha >> *char_("A-Za-z0-9_")]; In and of itself, id is working fine. Then I've got a "full id":
rule
full_id %= id >> *(char_('.') >> id); Where:
struct full_id_t { std::string val; };
full_id_t::val is quite intentional for reasons elsewhere in the grammar.
The perplexity comes in, it seems lexeme is only shaving off the first word as the val.
For instance, parsing "two.oranges.red.test", I receive back "two" in the AST.
Perhaps I should defer specifying the lexeme part of id until later?
[snip] The following simplification:
https://coliru.stacked-crooked.com/a/1adacde1a472d7a7 shows the full_id_t has the full attributes; however, it does *not* join them with the '.' char. Instead, it's a vectorstd::string. Unfortunately, I don't know how to automatically combine into a single string, but maybe this simplification will give you a starting point to figure that out. -regards, Larry
participants (4)
-
Gavin Lambert
-
Larry Evans
-
Michael Powell
-
rmawatson rmawatson