Dave Druelinger wrote:
I'm using Boost.Regex to match a long, complicated pattern to between 30 and 50 long strings. I loop through and call regex_search until it doesn't find any more matches in the current string and then it moves on to the next string. The problem is that when it has looped through about 40 of the strings, Boost.Regex will hang on the regex_search() function, and it will just sit there processing forever. If I reduce the number of strings I match against there is no problem; 20-30 iterations works like a charm. It also seems to help if I reduce the size of the regex pattern.
Frankly I have no clue what could be causing the problem, and any help would be greatly appreciated.
I think you need to isolate the problem: figure out which specific string against which specific pattern is causing the problem. [Aside: should you happen to be using Visual Studio, wait till it hangs, and then do a "break-all" and work up the call stack to see what the string and pattern was]. Which Boost.Regex version are you using? In all recent releases there is code to detect pathological cases and throw an exception so you don't get into this situation. BTW: the issue here is that matching Perl style regexes is an NP-Hard problem in general. Usually you will never hit that kind of situation though :-) When things go wrong, it's normally nested "ambiguous" repeats combined with a particular pattern of text in the string - long sections of whitespace for example. The trick is to try and optimise your regex so that each time the state machine has to make a decision, there is only one way it can go for any given input characater. For example if you have: (\s*something-to-be-matched\s*)+ the expression may behave badly when it sees long sections of whitespace, since those two \s*'s are effectively right next to each other, but if we make it less ambiguous: \s*(something-to-be-matched\s*)+ then it will be fine, because you don't have two \s*\s* 's next to each other. Hopefully, I'm making sence, and this will trigger an idea :-) Regards, John Maddock.