[regex] Support for Perl's (*SKIP)

18 Apr 2015

      I'd like to see support for Perl's (*SKIP) regex verb in Boost.

There are a number of verbs, but that one has an interesting and
frequent use case: it enables searching for an expression but only
outside of some contexts.

There is a page called "The best regex trick" with details about the
process and a number of examples. I can't link to it as this is my first
message (I attempted before but it was rejected). It explains how to do
it with and without (*SKIP). I've seen several questions in Stack
Overflow asking how to accomplish that task, so it seems it's quite
frequent to run into that need.

Let's say for example that we want to find the string 'foo' as an
identifier in C. This is a crude example of a Perl regex that does it (a
real one might need to be more elaborate; in particular, backslashes for
line continuation are not considered):

  (?x-s)                 (?# free spacing, dot doesn't match newline)
  (?://.*+               (?# eat single-line comment text)
    |/\*[\S\s]*?\*/      (?# eat multi-line comment text)
    |"(?:\\.|[^"\n])*+"  (?# eat string text)
  )(*SKIP)(?!)           (?# skip these)
  |\bfoo\b               (?# match this)

regex::search will match that expression only when foo is present
outside of a string or comment.

Without (*SKIP), it can be done only by calling regex::search multiple
times, using an expression like this:

  (?-s)//.*+|/\*[\S\s]*?\*/|"(?:\\.|[^"\n])*+"|(\bfoo\b)

and ignoring every match where group 1 wasn't matched. That's presumed
to be slower, and certainly more inconvenient for the programmer.

Support for this particular use case would be a great feature to have in
the regex engine.

Sei

Sei Lisa

John Maddock

Sei Lisa

John Maddock

tags

participants (2)