Dear Developers,
I found a possible trap in the design of the syntax of the Regex library.
Consider the following code: std::string text( "blabla123xyz" ); boost::regex expression( "\\w+(\\d+)\\w+" ); boost::smatch matches; boost::regex_search( text, matches, expression ); text = "asdfghjkl"; std::string value = matches[1];
Although this code is not very useful, it can lead to inpredictable behaviour. As far as i know the matches just reference the string position in the original string. so when the string is changed the matches don't fit any more. This may be a quite good performance but it requires to be very careful. Especially if the string is just referenced somewhere and the matches are given to somewhere else.
As you say, it's performance related - had match_results copied the string the cost would be at least 10 times the normal cost of a call to regex_search (all due to the memory allocations). You also lose positional information if you store copies rather than iterators.
Furthermore as i saw the Regex library I wondered about its interface. It seems more like a C library interface than C++ code. I also code in Ruby and the Regex class is much more convenient. The pattern matching is done there by a method of class Regex and returns the matches: expression = Regex.new( "\w+(\d+)\w" ) matches = expression.match( "blabla123xyz" ) if ( matches ) ...
Would it be possible to implement such a more object oriented interface to boost::regex?
Sigh... you mean like the deprecated RegEx class: http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/ref/dep... The current interface is closely modeled on the C++ standard library, and of course will *be part of the next C++ standard*. The idea is that objects store data, and free functions operate upon them (as with the standard library containers and algorithms for example). One advantage of this approach is that the user can extend the range of operations available, something that is basically impossible with a "closed" OO design where everything is in the class. For example one could easily define a new variation on regex_replace that performed a customized replace operation. HTH, John.