[regex] Is there any function to know where the input fails to match a regex?
Hello, Maybe my question is a silly question but I have a recurrent problem and maybe there is an easy solution: I need to create a lot of regex expressions to match several Unix command outputs. Usually, my firsts regex definitions are wrong and they don't match, so when the match fails I would like to know where the "problem" is (more or less) in order to change the regular expression. For example: Input text: "123 hello abc" My first wrong Regex: "(\d)+\s+(\w)+\s+(\d)+" If I try to match this input against the regex using regex_search with the boost::match_continuous flag I only have the false value returned by the function but it doesn't help me to know that the "123 hello" matched the regex because the problem starts at "abc" trying to match "(\d)+". I can simulate what I need checking subexpressions adding every iteration a new "sub-regex" and stopping when the match fails (I do it manually, changing the regex). With my undefined funtion I would obtain: - Input text:"123 hello abc" - Latest rigth Regular expression: "(\d)+\s+(\w)+\s+" Matches: "123 hello " - First wrong regex: "(\d)+\s+(\w)+\s+(\d)+" does not match. -> So, "(\d)+" does not match "abc" I understand that usually this is not as easy as I write in my simple example because of the power of regular expressions but I think a function like that would improve a lot the error reporting capabilities. Is there any way to do what I need or I must to write my own check function? Best regards, Jordi
With my undefined funtion I would obtain:
- Input text:"123 hello abc" - Latest rigth Regular expression: "(\d)+\s+(\w)+\s+" Matches: "123 hello " - First wrong regex: "(\d)+\s+(\w)+\s+(\d)+" does not match.
-> So, "(\d)+" does not match "abc"
I understand that usually this is not as easy as I write in my simple example because of the power of regular expressions but I think a function like that would improve a lot the error reporting capabilities. Is there any way to do what I need or I must to write my own check function?
That would indeed be useful: actually what we really need is a regular expression debugger, but both need a fully instrumented regex engine to associate states in the machine with characters in the original expression (so you know where the failure occurred). I'm afraid there's no way to do what you want at present, it's always been on my TO-DO list, but never seems to have surfaced ! John.
John Maddock wrote:
With my undefined funtion I would obtain:
- Input text:"123 hello abc" - Latest rigth Regular expression: "(\d)+\s+(\w)+\s+" Matches: "123 hello " - First wrong regex: "(\d)+\s+(\w)+\s+(\d)+" does not match.
-> So, "(\d)+" does not match "abc"
I understand that usually this is not as easy as I write in my simple example because of the power of regular expressions but I think a function like that would improve a lot the error reporting capabilities. Is there any way to do what I need or I must to write my own check function?
That would indeed be useful: actually what we really need is a regular expression debugger, but both need a fully instrumented regex engine to associate states in the machine with characters in the original expression (so you know where the failure occurred).
I'm afraid there's no way to do what you want at present, it's always been on my TO-DO list, but never seems to have surfaced !
John. Does anyone know any application/utility that includes this feature (or a regular expression debugger)? At least this utility would be easier to define (and check) a regular expression against a difficult input.
Thanks anyway, Jordi
On Mon, 14 Feb 2005 12:48:48 +0100, jordi
John Maddock wrote:
With my undefined funtion I would obtain:
- Input text:"123 hello abc" - Latest rigth Regular expression: "(\d)+\s+(\w)+\s+" Matches: "123 hello " - First wrong regex: "(\d)+\s+(\w)+\s+(\d)+" does not match.
-> So, "(\d)+" does not match "abc"
I understand that usually this is not as easy as I write in my simple example because of the power of regular expressions but I think a function like that would improve a lot the error reporting capabilities. Is there any way to do what I need or I must to write my own check function?
That would indeed be useful: actually what we really need is a regular expression debugger, but both need a fully instrumented regex engine to associate states in the machine with characters in the original expression (so you know where the failure occurred).
I'm afraid there's no way to do what you want at present, it's always been on my TO-DO list, but never seems to have surfaced !
John. Does anyone know any application/utility that includes this feature (or a regular expression debugger)? At least this utility would be easier to define (and check) a regular expression against a difficult input.
Thanks anyway,
Jordi
Try 'The Regex Coach' @ http://weitz.de/regex-coach/ - that's my favourite. Built on a Lisp port of PCRE, it uses Lisp's macro feature to build instrumentation into the library. The syntax is different in places to Boost.Regex, but it's close enough for development work. HTH Stuart Dootson
-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Stuart Dootson Sent: Monday, February 14, 2005 5:41 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] Re: [regex] Is there any function to know where theinput fails to match a regex?
On Mon, 14 Feb 2005 12:48:48 +0100, jordi
wrote:
<snipped>
Does anyone know any application/utility that includes this feature (or a regular expression debugger)? At least this utility would be easier to define (and check) a regular expression against a difficult input.
Thanks anyway,
Jordi
Try 'The Regex Coach' @ http://weitz.de/regex-coach/ - that's my favourite. Built on a Lisp port of PCRE, it uses Lisp's macro feature to build instrumentation into the library. The syntax is different in places to Boost.Regex, but it's close enough for development work.
That one's good. You might also look at RxToolkit (can't turn up the URL right now, but it's not the one in Komodo) or RegexBuddy (http://www.regular-expressions.info/regexbuddy.html). I personally like RegexBuddy better because of the way it displays the match, but then, it's shareware at $30, which bugs some folk. There's an interesting example of it on the blog http://www.codinghorror.com/blog/archives/000027.html. http://en.wikipedia.org/wiki/Regular_expression is interesting, as well. Reid
Reid Sweatman wrote:
Does anyone know any application/utility that includes this feature (or a regular expression debugger)? At least this utility would be easier to define (and check) a regular expression against a input.
Try 'The Regex Coach' @ http://weitz.de/regex-coach/ - that's my favourite. Built on a Lisp port of PCRE, it uses Lisp's macro feature to build instrumentation into the library. The syntax is different in places to Boost.Regex, but it's close enough for development work.
That one's good. You might also look at RxToolkit (can't turn up the URL right now, but it's not the one in Komodo) or RegexBuddy (http://www.regular-expressions.info/regexbuddy.html). I personally like RegexBuddy better because of the way it displays the match, but then, it's shareware at $30, which bugs some folk. There's an interesting example of it on the blog http://www.codinghorror.com/blog/archives/000027.html. http://en.wikipedia.org/wiki/Regular_expression is interesting, as well.
Reid
Thanks for your replies. I will try any of them. I really do not need them because my regex are not so difficult and I can use "try & error" but it's always better using a debugger! Jordi
jordi wrote:
Reid Sweatman wrote:
Does anyone know any application/utility that includes this feature (or a regular expression debugger)? At least this utility would be easier to define (and check) a regular expression against a input.
Try 'The Regex Coach' @ http://weitz.de/regex-coach/ - that's my favourite. Built on a Lisp port of PCRE, it uses Lisp's macro feature to build instrumentation into the library. The syntax is different in places to Boost.Regex, but it's close enough for development work.
That one's good. You might also look at RxToolkit (can't turn up the URL right now, but it's not the one in Komodo) or RegexBuddy (http://www.regular-expressions.info/regexbuddy.html). I personally like RegexBuddy better because of the way it displays the match, but then, it's shareware at $30, which bugs some folk. There's an interesting example of it on the blog http://www.codinghorror.com/blog/archives/000027.html. http://en.wikipedia.org/wiki/Regular_expression is interesting, as well.
Reid I didn't find a simple download for Rxtoolkit and regexbuddy doesn't have a demo download so I finally tried regex-coach and it works great!. It has accepted all my regex (not too complex, but not too easy neither) and now I can find where the expression does not match easily in just 30 seconds!
Thanks again, Jordi
jordi wrote:
Reid Sweatman wrote:
Does anyone know any application/utility that includes this feature (or a regular expression debugger)? At least this utility would be easier to define (and check) a regular expression against a input.
Try 'The Regex Coach' @ http://weitz.de/regex-coach/ - that's my favourite. Built on a Lisp port of PCRE, it uses Lisp's macro feature to build instrumentation into the library. The syntax is different in places to Boost.Regex, but it's close enough for development work.
That one's good. You might also look at RxToolkit (can't turn up the URL right now, but it's not the one in Komodo) or RegexBuddy (http://www.regular-expressions.info/regexbuddy.html). I personally like RegexBuddy better because of the way it displays the match, but then, it's shareware at $30, which bugs some folk. There's an interesting example of it on the blog http://www.codinghorror.com/blog/archives/000027.html. http://en.wikipedia.org/wiki/Regular_expression is interesting, as well.
Reid
I didn't find a simple download for Rxtoolkit and regexbuddy doesn't have a demo download so I finally tried regex-coach and it works great!. It has accepted all my regex (not too complex, but not too easy neither) and now I can find where the expression does not match easily in just 30 seconds! Thanks, Jordi
participants (4)
-
John Maddock
-
jordi
-
Reid Sweatman
-
Stuart Dootson