RE: [Boost-Users] Regex++ newbie problems
-----Original Message----- From: John Maddock [mailto:john_maddock@compuserve.com] Sent: 20 March 2003 12:48 To: Boost-Users@yahoogroups.com Subject: Re: [Boost-Users] Regex++ newbie problems
I've just started using Regex++ (from boost 1.29.0) and I'm experiencing some strangeness that don't seem to be mentioned in the faq. [snip] Word_expression("([:punct::space:]*)([-:upper::lower:^[:punct:: space:]]+)([: punct::space:]*)");
Is it right that 'bad' expressions should coredump?
boost::regex will through an exception if you pass it an invalid expression - you need to catch it or else yes your program will core dump.
Hi, Thanks. I knew it was a newbie error. I guess I was working too late and reading only every other line in the manual. I tell from my bad grammar. I'll try and save face by attempting to make a useful contribution. :-) Is there a debug variant or third party program that can be used to generate a useful syntax error? Note: I haven't looked at the exception generated yet but I would assume (incorrectly?) that it does not give much detail about the error. Though if you write and use expressions cleanly it should be trivial to find them by inspection.
It's an invalid expression because:
[:punct::space:]* should be [[:punct:][:space:]]*
and
[-:upper::lower:^[:punct::space:]] you can't nest character classes like that (in any regular expression language that I know of).
Bah. Newbie error. I only tried it "straw clutching mode" because of my earlier error.
(as an aside maybe we could catch bad ones better by replacing regex strings with overloaded operators the way streams have superceded printf)
Apologies for being slightly off topic for the users group. Has there been any work in this direction? We want to compile the expression for efficiency reasons. Building them up using operator<< might sacrifice this without an additional "reduction" phase to compile down to the most efficient automata. I guess this would be the regexp equivalent to endl. Still I like the idea as a debugging tool and I don't think the efficiency lost would be prohibitive. Regular expressions being more like trees rather than streams the << syntax might get a bit ugly with all the brackets required. E.g. char_class Punct = regex::PUNCTUATION + regex::SPACE reg_exp WordExpression = Kleene_closure(Punct) + Positive_closure(char_class("-") + regex::ALPHA) + Kleene_closure(Punct); I guess that is pretty ugly compared to the conventional syntax, despite the improved checkability. I meant to write something like that years ago but never found the time and did have working regexp library around. How about the equivalent using some hidden template metaprogramming (for use when the expression is fixed at compile time) I have a feeling that the complexity added relative to the minor inconvenience of setting up the expression on start-up outweigh the benefits. Still I would be interested to read about research in this area (i.e. tree syntax & compile time compilation of regular expressions in C++ or other languages). With even the most elegant design there's usually a way it can be improved if you look hard enough. (perhaps that improved should be in quotes :-) I feel quite sincere about both interpretations)
I found I still get rogue matches on punctuation and spaces
when I use
the manually expanded form below:
You are using the member first of boost::match_results as a null terminated string - it is *Not* a copy of the string matched or a null terminated string it is an iterator into your text - either use the sequence (first-second), or call match_results::str() to get a std::string object.
John.
Bah, I even remember reading that (and doing it the first time). Sorry for time wasting and thanks again. Regards, Bruce A. ============================================================================ Any opinions expressed in this e-mail are those of the individual and not necessarily those of Tyco Safety Products. Any prices for the supply of goods or services are only valid if supported by a formal written quotation. This e-mail and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from Tyco Saftey Products are confidential and solely for the use of the intended recipient. If you are not the intended recipient or the person responsible for delivery to the intended recipient, be advised that you have received this e-mail in error and that any use is strictly prohibited. In this event, please notify us via e-mail at 'helpdesk.tepg@tycoint.com' or telephone on 0121 255 6499 and then delete the e-mail and any copies of it. ============================================================================
(as an aside maybe we could catch bad ones better by replacing regex strings with overloaded operators the way streams have superceded printf)
Apologies for being slightly off topic for the users group. Has there been any work in this direction? We want to compile the expression for efficiency reasons. Building them up using operator<< might sacrifice this without an additional "reduction" phase to compile down to the most efficient automata. I guess this would be the regexp equivalent to endl. Still I like the idea as a debugging tool and I don't think the efficiency lost would be prohibitive. Regular expressions being more like trees rather than streams the << syntax might get a bit ugly with all the brackets required. E.g.
I think that's not all that easy - as you say dealing with nested sub-expressions and the like is pretty nasty.
How about the equivalent using some hidden template metaprogramming (for use when the expression is fixed at compile time) I have a feeling that the complexity added relative to the minor inconvenience of setting up the expression on start-up outweigh the benefits.
Have you looked at spirirt? OK it's a parser not a regex engine but there is some overlap. John
participants (2)
-
Bruce Adams [TSP Sunbury]
-
John Maddock