Speaking as a satisfied user of the regular expression library, always looking to help make it better:
I was under the impression that point (a) didn't cost anything in Boost::Regex because it was templatized on the character type. Am I mistaken? Case (b) is fairly rarely used, but (c) is common. In any event, it is certainly true that after compiling the regular expression, you know whether these are needed. So if there are faster algorithms for these special cases, could they be incorporated into the library without much overhead?
The point is that there are a wide range of differing state machine representations available - to make "automatic" use of these one would have to effectively implement several different regex state machines and switch between them based on run time detection (what kind of expression you have), this is a lot of work as well as adding code bloat. With respect to (a), it is true that narrow character regexes make some optimisations now, but many more are available - mainly in when in combination with (b) and (c).
C based libraries can also use alloca, which generally gives at least a 2x performance increase.
I know that alloca is not 'officially' available in portable C++. But I think most C++ compilers will handle C-like useages for this construct. I know we use it successfully on the compilers we use (gcc, Sun CC). So if there is someplace it would be useful, you could almost certainly get away with it, probably #ifdef'd around for safety.
Point taken, however it means a complete rewrite (and adds to the maintenance a lot - more config options to test etc). Personally I would rather see a separate regex type with limited usefulness, but better performance when it can be used. John Maddock http://ourworld.compuserve.com/homepages/john_maddock/index.htm