c++ parser with boost::spirit?

Has anyone created a c++ parser with boost::spirit? Thanks, Noel

"Noel Yap"wrote
Has anyone created a c++ parser with boost::spirit?
I dont know if anyone has (though I believe Hartmut Kaiser created a C parser) but AFAIK the problem with C++ parsers in general is that they dont tell the whole story, because in C++ there is a huge amount of context specific information. e.g in a template non-template, function argument scope, class scope etc each of which has its own rules which are beyond a simple parsers grasp. Nevertheless It would be a very worthwile project ... ;-) Could also try asking on the Spirit developers list: https://lists.sourceforge.net/lists/listinfo/spirit-devel regards Andy Little

Andy Little wrote:
Has anyone created a c++ parser with boost::spirit?
I dont know if anyone has (though I believe Hartmut Kaiser created a C parser) but AFAIK the problem with C++ parsers in general is that they dont tell the whole story, because in C++ there is a huge amount of context specific information.
Isn't this the problem with /all/ parsers, and in general, it is an AST treewalker that actually determines the context ? As in, if you would want context, you would parse, build and AST, and then walk over the AST to determine the context... just as with any language ? Regards, Leon Mergen

"Leon Mergen"
Andy Little wrote:
Has anyone created a c++ parser with boost::spirit?
I dont know if anyone has (though I believe Hartmut Kaiser created a C parser) but AFAIK the problem with C++ parsers in general is that they dont tell the whole story, because in C++ there is a huge amount of context specific information.
Isn't this the problem with /all/ parsers, and in general, it is an AST treewalker that actually determines the context ?
As in, if you would want context, you would parse, build and AST, and then walk over the AST to determine the context... just as with any language ?
I guess so, except high altitude mountain climbing over the AST using copious supplies of oxygen might be nearer the mark than walking. As I understand it C++ is not beloved by compiler writers for this reason! I guess it all comes down to the exact definition of "a C++ parser"and what you want to do with its output. regards Andy Little

On 3/7/06, Andy Little
I guess so, except high altitude mountain climbing over the AST using copious supplies of oxygen might be nearer the mark than walking. As I understand it C++ is not beloved by compiler writers for this reason! I guess it all comes down to the exact definition of "a C++ parser"and what you want to do with its output.
Some things I had in mind were: - code beautification - static dependency analysis for link- or compile-compatibility - other static code analysis - creation of refactoring tools Noel

"Noel Yap" wrote
On 3/7/06, Andy Littlewrote:
I guess so, except high altitude mountain climbing over the AST using copious supplies of oxygen might be nearer the mark than walking. As I understand it C++ is not beloved by compiler writers for this reason! I guess it all comes down to the exact definition of "a C++ parser"and what you want to do with its output.
Some things I had in mind were: - code beautification - static dependency analysis for link- or compile-compatibility - other static code analysis - creation of refactoring tools
I think you would have to go quite deep to do those things. One simple example is that you need to find out if a name is a type or an object to continue parsing an expression, then you have all the leftovers from C, the weird C useage of typedef etc that must be catered for. IOW C++ parsing is not cleanly split into syntax and semantics as theorists like to have it. I read somewhere that the most recent gcc C++ parser was ultimately written by hand based on a recursive descent parser, as was Bjarne Stroustrups original. He mentions it somewhere in D&E I think. Anyway I'm sure it would be an interesting topic on the Spirit developers list! regards Andy Little

"Andy Little"
"Noel Yap" wrote
On 3/7/06, Andy Littlewrote:
I guess so, except high altitude mountain climbing over the AST using copious supplies of oxygen might be nearer the mark than walking. As I understand it C++ is not beloved by compiler writers for this reason! I guess it all comes down to the exact definition of "a C++ parser"and what you want to do with its output.
Some things I had in mind were: - code beautification - static dependency analysis for link- or compile-compatibility - other static code analysis - creation of refactoring tools
I forgot to say that though this would be a very difficult project, I think it would also be a very worthwhile project. My analysis of the difficulties is because I started naively to try to write a C++ parser some years ago. The main issue I found is that I just didnt know the language well enough to proceed so I was constantly consulting the standard to learn huge chunks of the language itself, which meant progress was slow though it did help me understand C++ a bit more! I was also using VC6 which didnt help! I think its when I came to parsing templates that I really hit difficulties as I had very little experience with templates at the time. I know a bit more about templates now so I might do better but then there are all the details of overloaded functions , finding best match, partial specialisation etc etc. This is the big problem with a C++ parser. its just incredibly complicated. Nevertheless judging by the interest in Wave preprocessor I have no doubt there would be a huge interest in such a beast by boost and C++ developers in general for all the tasks you mention and more. regards Andy Little

Parsing C++ is only possible (unambigiously) after
preprocessing it first, and when at any moment a full
list of every identifier is known: the parser needs
to know which types have been declared, which variables
exist etc, in the scope that it is parsing at that
moment. As a result, any C++ parser has to be almost
a compiler before it can work.
--
Carlo Wood

"Carlo Wood" wrote
Parsing C++ is only possible (unambigiously) after preprocessing it first, and when at any moment a full list of every identifier is known: the parser needs to know which types have been declared, which variables exist etc, in the scope that it is parsing at that moment. As a result, any C++ parser has to be almost a compiler before it can work.
When I attempted this I didnt looking up names a major problem. The whole program was modelled as a tree. At the root was an abstract base class called Scope (which would be realised as namespaces classes etc) each derived class having separate member symbol-tables for each type of entity that they could contain. A Scope declaration looked like below and looking back at it it seems its main job was returning names and turning them into particular types of entities. The h_str type there (for example in the Find function) is basically a handle to a string. Whenever the lexer found a name it would immediately request an h_str for it. The idea here was simply to use integer ids for speed rather than passing strings around directly. It was only occasionally necessary to turn the id back into a string for users benefit The E_ref class returned a pointer to a something with information as to what class of entity it actually contained. If an attempt was made to convert the entity to the wrong type it would throw an exception. Overall looking back on the work I did I think I would take the same approach again. I have no doubt the result would be very slow but importantly it was relatively easy to understand and work on. Maybe I should put the whole work in the vault though some of it is a little cringemaking for boost and I guess it has little to do with Spirit though ... As I said before the main issue with writing a C++ parser is that you need to know the language really well, otherwise you get into the position as I did of not only trying to figure out how to write a parser which is hard enough but also learning the higher echelons of C++ at the same time. I also found it easier to write a recursive descent parser directly by hand than trying to automate it, because it was easier to wiggle the code that way. IIRC Bjarne Stroustrup says something similar with regard to CFront regards Andy Little // various entities which can be members of a scope, some are scopes some arent class Class; class Union; class Enum; class Namespace; class Typedef; class ClassTemplate; class FncLst; class Object; //scope abstract base class class Scope { Scope* parent; protected: virtual ~Scope(){} Scope(Scope* Parent):parent(Parent){} public: Scope* getParent()const{return parent;} struct E_ref{ public: enum Entity{ NOTFOUND,OBJECT, CLASS,UNION, ENUM,FNC_LST, TYPEDEF, NAMESPACE, CLASS_TEMPLATE }; private: Entity entity; union{ void* m_notfound; Object* m_object; Class* m_class; Union* m_union; Enum* m_enum; Typedef* m_typedef; FncLst* m_fncLst; Namespace* m_namespace; ClassTemplate* m_classTemplate; }; void Assert(Entity e){if( entity != e)throw BadE_ref();} void chk_null_ptr(){if (!m_notfound) entity = NOTFOUND;} public: bool operator==(Entity e)const{return entity == e;} bool operator!=(Entity e)const{return entity != e;} Entity operator()()const{return entity;} operator Class&() {Assert(CLASS);return *m_class;} operator Union&() {Assert(UNION);return *m_union;} operator Enum&() {Assert(ENUM);return *m_enum;} operator Namespace&(){Assert(NAMESPACE);return *m_namespace;} operator Objects::Object&(){Assert(OBJECT);return *m_object;} operator Typedef&(){Assert(TYPEDEF);return *m_typedef;} operator FncLst&(){Assert(FNC_LST);return *m_fncLst;} operator ClassTemplate&(){Assert(CLASS_TEMPLATE);return *m_classTemplate;} E_ref(): entity(NOTFOUND),m_notfound(0){} E_ref(Object& ob):entity(OBJECT),m_object(&ob){} E_ref(Class& c): entity(CLASS),m_class(&c){} E_ref(Union& u): entity(UNION),m_union(&u){} E_ref(Enum& e): entity(ENUM),m_enum(&e){} E_ref(FncLst& fl):entity(FNC_LST),m_fncLst(&fl){} E_ref(Namespace& n): entity(NAMESPACE),m_namespace(&n){} E_ref(ClassTemplate& ct): entity(CLASS_TEMPLATE),m_classTemplate(&ct){} E_ref(Typedef& t):entity(TYPEDEF),m_typedef(&t){} }; virtual E_ref Find(h_str name)const=0; virtual E_ref FindMember(h_str name)const=0; virtual E_ref FindType(h_str name)const=0; virtual E_ref FindMemberType(h_str name)const=0; Namespace& FindNearestEnclosingNamespace()const; Scope& FindNearestNonClassNonProto()const; virtual bool AddForwardDecl( Token::ClassKey::Type t, h_str identifier,Input::TokenStream& tstream)=0; bool hasParent(const Scope*)const; };

-----Original Message----- From: boost-users-bounces@lists.boost.org [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Noel Yap
On 3/7/06, Andy Little
wrote: I guess so, except high altitude mountain climbing over the AST using copious supplies of oxygen might be nearer the mark than walking. As I understand it C++ is not beloved by compiler writers for this reason! I guess it all comes down to the exact definition of "a C++ parser"and what you want to do with its output.
Some things I had in mind were: - code beautification - static dependency analysis for link- or compile-compatibility - other static code analysis - creation of refactoring tools
Noel
Have you looked at silvercity http://silvercity.sourceforge.net/? Oliver

On 3/6/06 2:39 PM, "Noel Yap"
Has anyone created a c++ parser with boost::spirit?
http://www.codeproject.com/cpp/wave_preprocessor.asp -- Jon Kalb Kalb@LibertySoft.com

Jon Kalb wrote:
Has anyone created a c++ parser with boost::spirit?
That's not a C++ parser but only a full C/C++ preprocessor library. FYI, Wave is now part of Boost (since V1.33.0). Regards Hartmut
-- Jon Kalb Kalb@LibertySoft.com
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

On 3/6/06, Noel Yap
Has anyone created a c++ parser with boost::spirit?
Try cpp_to_html and quickdoc, located here: http://spirit.sourceforge.net/repository/applications/show_contents.php The parsers included in each work, but do not handle alot of the context specific stuff. Feel free to use them as a jumping-off point.
participants (8)
-
Aaron Griffin
-
Andy Little
-
Carlo Wood
-
Hartmut Kaiser
-
Jon Kalb
-
Leon Mergen
-
Noel Yap
-
Oliver Schoenborn