Hi Peter, Peter Dimov wrote:
Phil Endecott wrote:
What I'm trying to do is to sanitise the input to an internet- exposed process, to reject malicious input'); drop table users; As an example I'll look at input that is supposed to be base-64 encoded and no more than a couple of kilobytes long.
Typical-case performance doesn't matter much as this runs once per process invocation (and hence also caching the compiled regex doesn't help), but I do want to be sure that it doesn't have bad worst-case complexity in the face of pathological input. So my first test is a quick check with a regular expression that should might trigger worst-case behaviour in a non-linear implementation:
a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa
But that's a separate case. This is a pathological regexp, not a "pathological" input string. If your regexes don't come from an external source, the performance of a pathological regex is not a potential security issue.
There are certain pathological regexp matching algorithms that have certain pathological regexp inputs that have certain pathological input strings. How do I know if a regexp that I've written in my code is a pathological one for the particular regexp library that I'm using?
Are these results available? Apologies if you've already posted them.
I was going to post some numbers but then I spotted that they got slower as the test system warmed up! Fundamentally, I'm only interested in the complexity order, not even factors of 10X, so I'm not trying to produce very detailed numbers. Previously I made some positive comments about the compile-time regular expression library CTRE. I had looked at a couple of presentations from CppCon 2018 & 2019 which presented the compile-time conversion of the regexp string into an NFA and then - according to the 2019 presentation - into a DFA, with O(N) matching complexity. But having looked at the code, it seems that unless I'm missing something the NFA-to-DFA conversion is not present! This is yet another backtracking NFA implementation. Boo. My recommendation is therefore RE2. Regards, Phil.