Boost:regex and C++ Parsing - Clarification
Hello, Thank you Caleb and Hartmut for your replies. You both seem to think regex is a bad way to go, I will explain better what I want to write just to be clear. I want to write a tool (cli probably) where I can say, here you go, here is a large folder full of code, go and parse it. I will store the results in XML format somewhere, then, I can do say, "<myapp> class someclass" and the program will go and find where that class is declared/defined using its database, saving me headache. So I thought I could use one of the C++ expat wrappers, and boost regex looked powerful enough to do the parsing if only I were handy enough with regular expression syntax. Anyway, I don't know if that better explanation will make any difference to you recommendations, I look forward to reading you opinions. Oh, and the example I looked at is here Hartmut: http://boost.org/libs/regex/example/snippets/regex_search_example.cpp - that is what got me thinking I might actually be able to take on this challenge. Okay lads, thanks again, cheers Gaz -----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Hartmut Kaiser Sent: 29 September 2004 18:41 To: boost-users@lists.boost.org Subject: RE: [Boost-users] Boost:regex and C++ Parsing Foster, Gareth wrote:
I have a couple of questions, firstly, how might I extend the example for parsing C++ code for class names so that it records the line number on which the class is defined? I thought maybe I could extend the regular expression so that it has "|(\n)" at the end, or maybe there is another way. In any case I am not sure if that is the correct way to extend the regex and I am unsure how to check the regex_match result to see if it was a new line character I encountered or a class name.
Which example you're referring to?
Secondly, are there any efforts anywhere to parse C++ for other keywords by this approach?
I don't know of any efforts going on regarding C++ parsing with the help of regex (non-authoritative answer). But there is the Wave library (Boost review is due shortly), which is a C/C++ preprocessor containing different C++ lexing components, which may be helpful for you during writing a class name extraction tool. Regards Hartmut _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Thank you Caleb and Hartmut for your replies. You both seem to think regex is a bad way to go, I will explain better what I want to write just to be clear.
I want to write a tool (cli probably) where I can say, here you go, here is a large folder full of code, go and parse it. I will store the results in XML format somewhere, then, I can do say, "<myapp> class someclass" and the program will go and find where that class is declared/defined using its database, saving me headache.
So I thought I could use one of the C++ expat wrappers, and boost regex looked powerful enough to do the parsing if only I were handy enough with regular expression syntax.
Anyway, I don't know if that better explanation will make any difference to you recommendations, I look forward to reading you opinions.
Oh, and the example I looked at is here Hartmut: http://boost.org/libs/regex/example/snippets/regex_search_example.cpp - that is what got me thinking I might actually be able to take on this challenge.
It depends what you want to do: if you want to use a "real" C++ parser then you will also have to preprocess the code (including the includes) and then parse the code. In theory this gives you a "perfect" result, but only if you know what include paths to use, and what predefined macros should be set (think about conditional code blocks). Regexes on the other hand, don't require you to preprocess the code, but can get confused by macros and the like. So you have to choose the way that best meets your expectations, and live with the defects either which way ;-) To solve your problem BTW, why not scan through the file for line starts (keeping count obviously!), and at each line start see if it's also the start of the regex you are interested in (one that matches a class definition for example), if you do this don't forget to either: prefix your expression with \A or Pass the match_continuous flag to regex_search, Either will anchor the search at the start of the line you are checking, and prevent the whole text being searched. John.
On Thu, 30 Sep 2004 09:38:05 +0100, Foster, Gareth
I want to write a tool (cli probably) where I can say, here you go, here is a large folder full of code, go and parse it. I will store the results in XML format somewhere, then, I can do say, "<myapp> class someclass" and the program will go and find where that class is declared/defined using its database, saving me headache.
If this is really all you're looking for, the Emacs "ctags" program is probably a much simpler solution to the problem. -- Caleb Epstein caleb.epstein@gmail.com
participants (3)
-
Caleb Epstein
-
Foster, Gareth
-
John Maddock