On Fri, Apr 25, 2003 at 05:35:06PM -0000, Dean wrote:
--- In Boost-Users@yahoogroups.com, "Joshua B. Smith"
wrote: I'm not sure what you were trying to say above, but my understanding is that the 2 patterns you just mentioned are equivalent. The docs say "{3}" is equivalent to "{3,3}" not "{3,}".
That is what I was trying to say, just not very clearly :)
I'm doing a search because I don't want to know whether the whole string matches but whether the regex is found in the string. Specifically, I'm doing:
m_regex.Search( sampleBody, boost::match_default | boost::match_any)
OK. That's kinda what I figured.
While I can believe that the design intention was that "\d{3}-" should be found in "1234567-" (at the fifth character), it seems inconsistent that it is *not* also found in "123456-" and "12345678- ". I'm seeing that inconsistent behavior.
It is not inconsistant because it fails to match then keeps going. It's all about greediness. For example: searching for a{1}b in strings 1) ab 2) aab 3) aaab searches correctly on 1 and incorrectly on 3 but not on 2 because a{1}b ab searches (correct) a{1}b aab Fails because it matched the two a's and then stopped because the string is done a{1}b aaab Fails on aa then begins to scan again and finds ab which fits the regex a{1}b Makes sense?
I realize there is more than one way to do it, and I'd be interested in what you'd recommend.
FWIW, in our SSN-matching case, we'll probably just use "\b\d{3}-\d {2}-\d{4}\b".
I too would probably use boundries. Or, you can use a regex_match on the the string returned on the regex_search. Or do both, it depends on how much I wanted to test the data for correctness. I tend do a search then match when I'm using hairy inputs. You can also use spaces like: \s*\d{3}-\d{2}-\d{4}\s* I tend to not use \b for no good reason or something like this maybe [\s,\.]*\d{3}-\d{2}-\d{4}[\s,\.]* -jbs