Dear Developers, I found a possible trap in the design of the syntax of the Regex library. Consider the following code: std::string text( "blabla123xyz" ); boost::regex expression( "\\w+(\\d+)\\w+" ); boost::smatch matches; boost::regex_search( text, matches, expression ); text = "asdfghjkl"; std::string value = matches[1]; Although this code is not very useful, it can lead to inpredictable behaviour. As far as i know the matches just reference the string position in the original string. so when the string is changed the matches don't fit any more. This may be a quite good performance but it requires to be very careful. Especially if the string is just referenced somewhere and the matches are given to somewhere else. Furthermore as i saw the Regex library I wondered about its interface. It seems more like a C library interface than C++ code. I also code in Ruby and the Regex class is much more convenient. The pattern matching is done there by a method of class Regex and returns the matches: expression = Regex.new( "\w+(\d+)\w" ) matches = expression.match( "blabla123xyz" ) if ( matches ) ... Would it be possible to implement such a more object oriented interface to boost::regex? Greetings, Sven -- Dipl.-Ing. Sven Bauhan DFS Deutsche Flugsicherung GmbH
Dear Developers,
I found a possible trap in the design of the syntax of the Regex library.
Consider the following code: std::string text( "blabla123xyz" ); boost::regex expression( "\\w+(\\d+)\\w+" ); boost::smatch matches; boost::regex_search( text, matches, expression ); text = "asdfghjkl"; std::string value = matches[1];
Although this code is not very useful, it can lead to inpredictable behaviour. As far as i know the matches just reference the string position in the original string. so when the string is changed the matches don't fit any more. This may be a quite good performance but it requires to be very careful. Especially if the string is just referenced somewhere and the matches are given to somewhere else.
As you say, it's performance related - had match_results copied the string the cost would be at least 10 times the normal cost of a call to regex_search (all due to the memory allocations). You also lose positional information if you store copies rather than iterators.
Furthermore as i saw the Regex library I wondered about its interface. It seems more like a C library interface than C++ code. I also code in Ruby and the Regex class is much more convenient. The pattern matching is done there by a method of class Regex and returns the matches: expression = Regex.new( "\w+(\d+)\w" ) matches = expression.match( "blabla123xyz" ) if ( matches ) ...
Would it be possible to implement such a more object oriented interface to boost::regex?
Sigh... you mean like the deprecated RegEx class: http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/ref/dep... The current interface is closely modeled on the C++ standard library, and of course will *be part of the next C++ standard*. The idea is that objects store data, and free functions operate upon them (as with the standard library containers and algorithms for example). One advantage of this approach is that the user can extend the range of operations available, something that is basically impossible with a "closed" OO design where everything is in the class. For example one could easily define a new variation on regex_replace that performed a customized replace operation. HTH, John.
As you say, it's performance related - had match_results copied the string the cost would be at least 10 times the normal cost of a call to regex_search (all due to the memory allocations). You also lose positional information if you store copies rather than iterators.
Ok this sounds sensual. I read the documentation again and found that it uses regex_iterator for the matches. I recently read over this information.
Sigh... you mean like the deprecated RegEx class: http://www.boost.org/doc/libs/1_44_0/libs/regex/doc/html/boost_regex/ref/de precated_interfaces/old_regex.html
It comes close, but the match and search methods also return bool not the matches.
The current interface is closely modeled on the C++ standard library, and of course will *be part of the next C++ standard*. The idea is that objects store data, and free functions operate upon them (as with the standard library containers and algorithms for example). One advantage of this approach is that the user can extend the range of operations available, something that is basically impossible with a "closed" OO design where everything is in the class. For example one could easily define a new variation on regex_replace that performed a customized replace operation.
Ok, this explains why the RegEx class is now deprecated. I was wondering before. But one point stays open yet: Why returning the matches as out parameter instead as return value? If you want to have the possibility to check if the match succeeds, class match_results<> would just need a unspecified-bool-type operator. This is also done in other classes e.g. shared_ptr<>. Greetings, Sven
But one point stays open yet: Why returning the matches as out parameter instead as return value?
If you need to perform many repeated matches on the same regex then re-using the same match_results each time saves regex_search from having to perform any memory allocation: again it's a big win in performance terms. You can't do that if the structure is returned by value. John.
participants (2)
-
John Maddock
-
Sven Bauhan