Boost.Spirit's Greedy Fundamental Type Parsers
Suppose I want to parse a list of ";"-separated floating point pairs with "," being the pair separator as in "1,2;3,4". Following this list comes a string literal representing a file extension, such as ".txt". Therefore what I want to successfully parse input like the following: 1,2;3,4.txt (For the record, the input could also be 1.1,2.2;3.3,4.4.txt) The parser I came up with is the 1:1 translation of above's description into the Spirit DSL and shows Spirit's expressive power: ((double_ % ",") % ";") >> ".txt" Unfortunately, the parser fails on the input with the integral values above. Why? Because the fundamental parser for double_ greedily matches on the "4." in "4.txt". Changing the "4" to "4.0" as in 1,2;3,4.0.txt parses successfully (but is not an option as it requires the user to always add a trailing ".0" in case the last digit is integral. I read about Spirit's DSL mapping to Parsing Expression Grammar (PEG) with the choice operator | being evaluated in order. So the next logical step for me was to try making use of it and adapting the parser: (((int_ | double_) % ",") % ";") >> ".txt" which works on 1,2;3,4.txt but no longer on 1,2;3,4.0.txt Is there a way to adapt the parser to handle both cases? I asked this on IRC and got the answer to try a solution based on ((double_ >> ".") | (int_ >> ".")) >> "txt" but when I use use this to parse "4.txt" into a std::vector<double> via parse(first, last, ((double_ >> ".") | (int_ >> ".")) >> "txt", into); the vector contains: {4, 4} and its size() is 2, which I can make no sense of at all (but this may be a different problem). Cheers, Daniel J H
On Thu, 31 Mar 2016 23:02:05 +0200
Daniel Hofmann
Suppose I want to parse a list of ";"-separated floating point pairs with "," being the pair separator as in "1,2;3,4". Following this list comes a string literal representing a file extension, such as ".txt".
Therefore what I want to successfully parse input like the following:
1,2;3,4.txt
(For the record, the input could also be 1.1,2.2;3.3,4.4.txt)
The parser I came up with is the 1:1 translation of above's description into the Spirit DSL and shows Spirit's expressive power:
((double_ % ",") % ";") >> ".txt"
Unfortunately, the parser fails on the input with the integral values above. Why? Because the fundamental parser for double_ greedily matches on the "4." in "4.txt". Changing the "4" to "4.0" as in
1,2;3,4.0.txt
parses successfully (but is not an option as it requires the user to always add a trailing ".0" in case the last digit is integral.
I read about Spirit's DSL mapping to Parsing Expression Grammar (PEG) with the choice operator | being evaluated in order. So the next logical step for me was to try making use of it and adapting the parser:
(((int_ | double_) % ",") % ";") >> ".txt"
which works on
1,2;3,4.txt
but no longer on
1,2;3,4.0.txt
Is there a way to adapt the parser to handle both cases?
I asked this on IRC and got the answer to try a solution based on
((double_ >> ".") | (int_ >> ".")) >> "txt"
but when I use use this to parse "4.txt" into a std::vector<double> via
parse(first, last, ((double_ >> ".") | (int_ >> ".")) >> "txt", into);
the vector contains: {4, 4} and its size() is 2, which I can make no sense of at all (but this may be a different problem).
That was me in IRC. I assumed you would be using `variant
On 04/01/2016 12:19 AM, Lee Clagett wrote:
On Thu, 31 Mar 2016 23:02:05 +0200 Daniel Hofmann
wrote: Suppose I want to parse a list of ";"-separated floating point pairs with "," being the pair separator as in "1,2;3,4". Following this list comes a string literal representing a file extension, such as ".txt".
Therefore what I want to successfully parse input like the following:
1,2;3,4.txt
(For the record, the input could also be 1.1,2.2;3.3,4.4.txt)
The parser I came up with is the 1:1 translation of above's description into the Spirit DSL and shows Spirit's expressive power:
((double_ % ",") % ";") >> ".txt"
Unfortunately, the parser fails on the input with the integral values above. Why? Because the fundamental parser for double_ greedily matches on the "4." in "4.txt". Changing the "4" to "4.0" as in
1,2;3,4.0.txt
parses successfully (but is not an option as it requires the user to always add a trailing ".0" in case the last digit is integral.
I read about Spirit's DSL mapping to Parsing Expression Grammar (PEG) with the choice operator | being evaluated in order. So the next logical step for me was to try making use of it and adapting the parser:
(((int_ | double_) % ",") % ";") >> ".txt"
which works on
1,2;3,4.txt
but no longer on
1,2;3,4.0.txt
Is there a way to adapt the parser to handle both cases?
I asked this on IRC and got the answer to try a solution based on
((double_ >> ".") | (int_ >> ".")) >> "txt"
but when I use use this to parse "4.txt" into a std::vector<double> via
parse(first, last, ((double_ >> ".") | (int_ >> ".")) >> "txt", into);
the vector contains: {4, 4} and its size() is 2, which I can make no sense of at all (but this may be a different problem).
That was me in IRC. I assumed you would be using `variant
` or `double` as your attribute type, and not `std::vector<double>`. If this is part of a larger expression and you need to use a std::vector for some reason look into the hold directive [0]: (hold[double_ >> "."] | (int_ >> ".")) >> "txt"
The sequence operator will immediately call push_back if the left side expression (`double`) succeeds. `hold` creates a copy of the vector, and swaps iff everything in the directive returns true. If you use a `variant` or a `double` as your attribute, then the attribute is overwritten by `int_` and the `hold` is not needed.
I see, so parsers immediately push_back into the vector and in case of failure the items remain in the vector, unless I'm using hold. This perfectly explains what I'm seeing here.
I am not sure why you want to use a `double` in this situation, but
std::vector<unsigned> out; parse(first, last, (+(uint_ >> '.') >> "txt"), out);
or
unsigned one = 0; boost::optional<unsigned> two; parse( first, last, (uint_ >> '.' >> -(uint_ >> '.') >> "txt"), one, two);
will prevent inputs that contain '-' or the various inputs that the real parser [1] accepts. uint_ [2] can also be specialized to have a min,max number of digits which might be useful to your situation.
I'm parsing into a std::vector<double> since I want both 1,2;3,4.txt as well as 1.1,2.2;3.3,4.4.txt to succeed. With a uint_ based parser as you suggest, I get a vector of {1,1,..} for the second example, which does not represent the input or lets me reconstruct it. Looking at strict_real_policies<double> I was under the impression that the default real policy should work for both inputs above, being able to parse both inputs into a vector of {1.0, 2.0, 3.0, 4.0} and {1.1, 2.2, 3.3, 4.4} respectively.
Lee
[0]http://www.boost.org/doc/libs/1_60_0/libs/spirit/doc/html/spirit/qi/referenc... [1]http://www.boost.org/doc/libs/1_60_0/libs/spirit/doc/html/spirit/qi/referenc... [2]http://www.boost.org/doc/libs/1_60_0/libs/spirit/doc/html/spirit/qi/referenc...
On Fri, 1 Apr 2016 09:07:30 +0200
Daniel Hofmann
On 04/01/2016 12:19 AM, Lee Clagett wrote:
On Thu, 31 Mar 2016 23:02:05 +0200 Daniel Hofmann
wrote: Suppose I want to parse a list of ";"-separated floating point pairs with "," being the pair separator as in "1,2;3,4". Following this list comes a string literal representing a file extension, such as ".txt". [...] I am not sure why you want to use a `double` in this situation, but
std::vector<unsigned> out; parse(first, last, (+(uint_ >> '.') >> "txt"), out);
or
unsigned one = 0; boost::optional<unsigned> two; parse( first, last, (uint_ >> '.' >> -(uint_ >> '.') >> "txt"), one, two);
will prevent inputs that contain '-' or the various inputs that the real parser [1] accepts. uint_ [2] can also be specialized to have a min,max number of digits which might be useful to your situation.
I'm parsing into a std::vector<double> since I want both
1,2;3,4.txt
as well as
1.1,2.2;3.3,4.4.txt
to succeed. With a uint_ based parser as you suggest, I get a vector of {1,1,..} for the second example, which does not represent the input or lets me reconstruct it.
You could use the optional parser to store this information:
+(uint_ >> '.' >> -(uint_ >> '.')) >> "txt"
with attribute `vector
Looking at strict_real_policies<double> I was under the impression that the default real policy should work for both inputs above, being able to parse both inputs into a vector of {1.0, 2.0, 3.0, 4.0} and {1.1, 2.2, 3.3, 4.4} respectively.
The `double_` parser will allow inputs like "+1.1.txt" or "1e1.txt". So hopefully these numbers are actual double values and not a versioning scheme. The `ureal_policies` trait can restrict valid inputs with some additional work. Overriding the behavior of `parse_exp`, `parse_exp_n`, `parse_nan`, and `parse_inf` to reject everything AND providing a static field `allow_leading_dot = false` might be enough. There is a grammar [0] on the real parsers page describing all the inputs allowed. Lee [0]http://www.boost.org/doc/libs/1_60_0/libs/spirit/doc/html/spirit/qi/referenc...
On 31.03.2016 23:02, Daniel Hofmann wrote:
((double_ % ",") % ";") >> ".txt"
Quick hack to fix this: ((double_ % ",") % ";") >> -lit('.') >> "txt" This erroneously accepts "1,2;3,4txt" as valid input, but it should behave correctly for all valid inputs and you may not care about false positives. -- Rainer Deyke (rainerd@eldwood.com)
participants (3)
-
Daniel Hofmann
-
Lee Clagett
-
Rainer Deyke