On Fri, Apr 25, 2003 at 08:27:54PM -0000, Dean wrote:
That's what I (eventually) guessed was happening. Thanks for confirming my suspicion. However, it still seems possible that this was not the original design intention. I suppose only John Maddock can answer that question...
Does he read this list?
It seems to me that when the code finds more than 1 "a", it should either:
1) skip past all subsequent "a"s before starting the scan again. This would cause "a{1}b" to be found in "ab" but not "aab", "aaab", "aaaab", etc. This would be very "greedy". :-)
Or:
2) restart the scan 1 character after where the previous scan started. This would cause "a{1}b" to be found in "ab", "aab", "aaab", "aaaab", etc. FWIW, I'm told that the regex searcher in the .NET Framework exhibits behavior #1. I mention that only as a point of reference -- I realize that different implementations can have somewhat different correct behaviors.
Perl and Python both exhibit behavior #2. I think emacs does too. and it doesn't surprise me that .Net is the greediest. :P I think a lot of regex engines have been converging on a perlish implementation. In fact, I'd never taken seriously the thought of another way, but most of my regex work is done in python/perl (until recently at any rate, I like to boost regex lib a lot).
Anyway, it is either a bug or a "gotcha". I've been using regexs occasionaly for over 10 years and it "got" me. :-)
I wouldn't say it "got" you. Regex's are still 50% voodoo 50% trick and 2% butterscotch ripple. -jbs