On 23/07/2021 16:23, Soronel Haetir via Boost wrote:
Phil Endecott,
Match time linear to the length of the string works fine if the regex doesn't have alternations but breaks down completely if they are present.
Imagine you have the regex "auto|bool|const|continue|...|([A-Z_][A-Za-z0-9_]*)" set up to match any C token (the ... in this case are missing keywords, integers, etc) but your input is "continuation". This regex will try for each alternate, get almost all the way with "continue" but then have to backtrack and try the other alternates. It's only when you get to the very last that the string will match. And it would be possible to contrive even worse examples (regex match over a dictionary for example)
We're getting well off topic here, but in your example, backtracking is NOT required since the pattern can be converted into a DFA that matches at in linear time. However, the full Perl syntax definitely can not be converted to a DFA (and it's not just the backreferences either). So... complaining that a Perl-compatible engine isn't O(N) is like complaining that your orange doesn't taste much like an apple ;) With regard to performance testing, it's very very hard to make a general comparison between libraries, indeed I would tend to suggest that it's folly to even try - there are just too many variables in play. However, if your problem domain is more limited, then you can (and indeed should as Phil has done) be able to do so, and Phil's results are exactly what *he* needs, while others may come to other conclusions for their particular case. In short, the orange grove is happy that Phil has found a nice apple :) Best, John. -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus