Re: [Boost-Users] Re: Inconsistent regexp matching when using quantifiers?

25 Apr 2003

      On Fri, Apr 25, 2003 at 08:27:54PM -0000, Dean wrote:
...
That's what I (eventually) guessed was happening.  Thanks for 
confirming my suspicion.  However, it still seems possible that this 
was not the original design intention.  I suppose only John Maddock 
can answer that question...
Does he read this list?
...
It seems to me that when the code finds more than 1 "a", it should 
either:
1) skip past all subsequent "a"s before starting the scan again.  
This would cause "a{1}b" to be found in "ab" but 
not "aab", "aaab", "aaaab", etc.  This would be very "greedy". :-)
Or:
2) restart the scan 1 character after where the previous scan 
started.  This would cause "a{1}b" to be found 
in "ab", "aab", "aaab", "aaaab", etc.
FWIW, I'm told that the regex searcher in the .NET Framework exhibits 
behavior #1.  I mention that only as a point of reference -- I 
realize that different implementations can have somewhat different 
correct behaviors.
Perl and Python both exhibit behavior #2.  I think emacs does too.
and it doesn't surprise me that .Net is the greediest. :P  

I think a lot of regex engines have been converging on a perlish 
implementation. In fact, I'd never taken seriously the thought of
another way, but most of my regex work is done in python/perl (until
recently at any rate, I like to boost regex lib a lot).
...
Anyway, it is either a bug or a "gotcha".  I've been using regexs 
occasionaly for over 10 years and it "got" me. :-)
I wouldn't say it "got" you. Regex's are still 50% voodoo 50% trick and 2%
butterscotch ripple.

-jbs

Re: [Boost-Users] Re: Inconsistent regexp matching when using quantifiers?

Joshua B. Smith