Re: [Boost-Users] Re: regex performance

18 Apr 2002

      ...
Speaking as a satisfied user of the regular expression library, always
looking to help make it better:
I was under the impression that point (a) didn't cost anything in
Boost::Regex because it was templatized on the character type. Am I
mistaken? Case (b) is fairly rarely used, but (c) is common. In any
event, it is certainly true that after compiling the regular
expression, you know whether these are needed. So if there are faster
algorithms for these special cases, could they be incorporated into the
library without much overhead?
The point is that there are a wide range of differing state machine
representations available - to make "automatic" use of these one would have
to effectively implement several different regex state machines and switch
between them based on run time detection (what kind of expression you have),
this is a lot of work as well as adding code bloat.  With respect to (a), it
is true that narrow character regexes make some optimisations now, but many
more are available - mainly in when in combination with (b) and (c).
...
...
C based libraries can also use alloca, which generally gives at least a
2x
performance increase.
I know that alloca is not 'officially' available in portable C++. But I
think most C++ compilers will handle C-like useages for this construct.
I know we use it successfully on the compilers we use (gcc, Sun CC). So
if there is someplace it would be useful, you could almost certainly
get away with it, probably #ifdef'd around for safety.
Point taken, however it means a complete rewrite (and adds to the
maintenance a lot - more config options to test etc).

Personally I would rather see a separate regex type with limited usefulness,
but better performance when it can be used.

John Maddock
http://ourworld.compuserve.com/homepages/john_maddock/index.htm