boost.regex doesn't implement character set complements?
Hi: I was trying to write "Match everything *except* Bar". I wrote this: bool aResult = boost::regex_match(L"Foo", boost::wregex(L"[^(Bar)]")); This doesn't match. But if I wrote the same thing in Perl, it does work, ie: print "match\n" if( "Foo" =~ "[^(Bar)]"); It looks a lot like sets can't contain subexpressions in boost::regex ... is that really true? Is there some other way to write the same thing? Rob.
Robert Mathews wrote:
I was trying to write "Match everything *except* Bar".
I wrote this:
bool aResult = boost::regex_match(L"Foo", boost::wregex(L"[^(Bar)]"));
This doesn't match. But if I wrote the same thing in Perl, it does work, ie:
print "match\n" if( "Foo" =~ "[^(Bar)]");
It looks a lot like sets can't contain subexpressions in boost::regex ... is that really true? Is there some other way to write the same thing?
I guarantee that your perl code isn't doing what you think it's doing. This: "Foo" =~ /[^(Bar)]/ will succeed because 'F' is not one of '(', 'B', 'a', 'r', or ')'. The same pattern means the same thing to Boost.Regex, but you're getting a different result because the regex_match algorithm only reports success if the pattern matches all of the input, which it doesn't in this case. Had you used regex_search, you'd have gotten the same results as perl. But it still wouldn't be doing what you want. What you're looking for is a negative lookahead assertion. If you want to match three characters that are not "Bar", you can do it like: "(?!Bar)..." (?!Bar) asserts that the next three characters are not "Bar", but doesn't consume them. The ... matches the next three characters (which are not "Bar") and consumes them. HTH, -- Eric Niebler Boost Consulting www.boost-consulting.com
Yes, that a negative lookahead assertions works very nicely. And you are
quite right that my perl wasn't really doing the right thing either.
Thank you very much!
"Eric Niebler"
Robert Mathews wrote:
I was trying to write "Match everything *except* Bar".
I wrote this:
bool aResult = boost::regex_match(L"Foo", boost::wregex(L"[^(Bar)]"));
This doesn't match. But if I wrote the same thing in Perl, it does work, ie:
print "match\n" if( "Foo" =~ "[^(Bar)]");
It looks a lot like sets can't contain subexpressions in boost::regex ... is that really true? Is there some other way to write the same thing?
I guarantee that your perl code isn't doing what you think it's doing. This:
"Foo" =~ /[^(Bar)]/
will succeed because 'F' is not one of '(', 'B', 'a', 'r', or ')'. The same pattern means the same thing to Boost.Regex, but you're getting a different result because the regex_match algorithm only reports success if the pattern matches all of the input, which it doesn't in this case. Had you used regex_search, you'd have gotten the same results as perl. But it still wouldn't be doing what you want.
What you're looking for is a negative lookahead assertion. If you want to match three characters that are not "Bar", you can do it like:
"(?!Bar)..."
(?!Bar) asserts that the next three characters are not "Bar", but doesn't consume them. The ... matches the next three characters (which are not "Bar") and consumes them.
HTH,
-- Eric Niebler Boost Consulting www.boost-consulting.com
participants (2)
-
Eric Niebler
-
Robert Mathews