Phil Endecott wrote:
What I'm trying to do is to sanitise the input to an internet- exposed process, to reject malicious input'); drop table users; As an example I'll look at input that is supposed to be base-64 encoded and no more than a couple of kilobytes long.
Typical-case performance doesn't matter much as this runs once per process invocation (and hence also caching the compiled regex doesn't help), but I do want to be sure that it doesn't have bad worst-case complexity in the face of pathological input. So my first test is a quick check with a regular expression that should might trigger worst-case behaviour in a non-linear implementation:
a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?aaaaaaaaaaaaaaaaaaaa
But that's a separate case. This is a pathological regexp, not a "pathological" input string. If your regexes don't come from an external source, the performance of a pathological regex is not a potential security issue.