
"Carlo Wood" wrote
Parsing C++ is only possible (unambigiously) after preprocessing it first, and when at any moment a full list of every identifier is known: the parser needs to know which types have been declared, which variables exist etc, in the scope that it is parsing at that moment. As a result, any C++ parser has to be almost a compiler before it can work.
When I attempted this I didnt looking up names a major problem. The whole program was modelled as a tree. At the root was an abstract base class called Scope (which would be realised as namespaces classes etc) each derived class having separate member symbol-tables for each type of entity that they could contain. A Scope declaration looked like below and looking back at it it seems its main job was returning names and turning them into particular types of entities. The h_str type there (for example in the Find function) is basically a handle to a string. Whenever the lexer found a name it would immediately request an h_str for it. The idea here was simply to use integer ids for speed rather than passing strings around directly. It was only occasionally necessary to turn the id back into a string for users benefit The E_ref class returned a pointer to a something with information as to what class of entity it actually contained. If an attempt was made to convert the entity to the wrong type it would throw an exception. Overall looking back on the work I did I think I would take the same approach again. I have no doubt the result would be very slow but importantly it was relatively easy to understand and work on. Maybe I should put the whole work in the vault though some of it is a little cringemaking for boost and I guess it has little to do with Spirit though ... As I said before the main issue with writing a C++ parser is that you need to know the language really well, otherwise you get into the position as I did of not only trying to figure out how to write a parser which is hard enough but also learning the higher echelons of C++ at the same time. I also found it easier to write a recursive descent parser directly by hand than trying to automate it, because it was easier to wiggle the code that way. IIRC Bjarne Stroustrup says something similar with regard to CFront regards Andy Little // various entities which can be members of a scope, some are scopes some arent class Class; class Union; class Enum; class Namespace; class Typedef; class ClassTemplate; class FncLst; class Object; //scope abstract base class class Scope { Scope* parent; protected: virtual ~Scope(){} Scope(Scope* Parent):parent(Parent){} public: Scope* getParent()const{return parent;} struct E_ref{ public: enum Entity{ NOTFOUND,OBJECT, CLASS,UNION, ENUM,FNC_LST, TYPEDEF, NAMESPACE, CLASS_TEMPLATE }; private: Entity entity; union{ void* m_notfound; Object* m_object; Class* m_class; Union* m_union; Enum* m_enum; Typedef* m_typedef; FncLst* m_fncLst; Namespace* m_namespace; ClassTemplate* m_classTemplate; }; void Assert(Entity e){if( entity != e)throw BadE_ref();} void chk_null_ptr(){if (!m_notfound) entity = NOTFOUND;} public: bool operator==(Entity e)const{return entity == e;} bool operator!=(Entity e)const{return entity != e;} Entity operator()()const{return entity;} operator Class&() {Assert(CLASS);return *m_class;} operator Union&() {Assert(UNION);return *m_union;} operator Enum&() {Assert(ENUM);return *m_enum;} operator Namespace&(){Assert(NAMESPACE);return *m_namespace;} operator Objects::Object&(){Assert(OBJECT);return *m_object;} operator Typedef&(){Assert(TYPEDEF);return *m_typedef;} operator FncLst&(){Assert(FNC_LST);return *m_fncLst;} operator ClassTemplate&(){Assert(CLASS_TEMPLATE);return *m_classTemplate;} E_ref(): entity(NOTFOUND),m_notfound(0){} E_ref(Object& ob):entity(OBJECT),m_object(&ob){} E_ref(Class& c): entity(CLASS),m_class(&c){} E_ref(Union& u): entity(UNION),m_union(&u){} E_ref(Enum& e): entity(ENUM),m_enum(&e){} E_ref(FncLst& fl):entity(FNC_LST),m_fncLst(&fl){} E_ref(Namespace& n): entity(NAMESPACE),m_namespace(&n){} E_ref(ClassTemplate& ct): entity(CLASS_TEMPLATE),m_classTemplate(&ct){} E_ref(Typedef& t):entity(TYPEDEF),m_typedef(&t){} }; virtual E_ref Find(h_str name)const=0; virtual E_ref FindMember(h_str name)const=0; virtual E_ref FindType(h_str name)const=0; virtual E_ref FindMemberType(h_str name)const=0; Namespace& FindNearestEnclosingNamespace()const; Scope& FindNearestNonClassNonProto()const; virtual bool AddForwardDecl( Token::ClassKey::Type t, h_str identifier,Input::TokenStream& tstream)=0; bool hasParent(const Scope*)const; };