On Mon, Sep 23, 2019 at 6:12 PM Glen Fernandes via Boost < boost@lists.boost.org> wrote:
Dominique explained some of the pull (stax) / push (sax) terminology to me off-list, and I agree. This does appear to be the more appealing underlying facility.
I didn't realize it was off-list, usually plain Reply goes to the list. But doesn't matter, Bjorn explained it better than me anyway. On Mon, Sep 23, 2019 at 6:11 PM Vinnie Falco via Boost < boost@lists.boost.org> wrote:
On Mon, Sep 23, 2019 at 8:58 AM Bjorn Reese via Boost
wrote: ...online parser... A push parser (SAX)... A tree parser (DOM)
I have no experience with these terms other than occasionally coming across them in my Google searching adventures. The parsers that I have written take as input one or more buffers of contiguous characters, and produce as "output" a series of calls to abstract member functions which are implemented in the derived class. These calls represent tokens or events, such as "key string", "object begin", "array end". So what would we call this in the taxonomy above?
That's a PUSH parser IMHO. The doc on Qt's XML PULL parser should make that clearer perhaps: https://doc.qt.io/qt-5/qxmlstreamreader.html#details Many of these terms originated in the XML world, and many (like SAX) from the Java world too. To give you a feel for it, here's my PUSH parser API: class JSONHandler { public: ... virtual bool handle_object_begin(); virtual bool handle_object_key(const std::string& key); virtual bool handle_object_end(); virtual bool handle_array_begin(); virtual bool handle_array_end(); virtual bool handle_number(int); virtual bool handle_number(int64_t); virtual bool handle_number(uint64_t); virtual bool handle_number(double value); virtual bool handle_string(const std::string& value); virtual bool handle_boolean(bool value); virtual bool handle_null(); ... }; bool json_parse(const char* json_utf8_text, size_t len, JSONHandler& handler); While that's my PULL parser API: enum JSONParsingEventType { //! Special end-of-document token. JSON_END = 0, // Value tokens. JSON_NULL, JSON_TRUE, JSON_FALSE, JSON_STRING, JSON_NUMBER, JSON_OBJECT_BEGIN, JSON_OBJECT_KEY, JSON_OBJECT_END, JSON_ARRAY_BEGIN, JSON_ARRAY_END, ... }; class JSONReader { public: JSONReader( const char* json_utf8_text, size_t len, const JSONParserOptions& options = JSONParserOptions() ); ~JSONReader(); JSONParsingEventType peek() const; JSONParsingEventType next(); JSONParsingEventType current() const; size_t skip_next(); size_t skip_current(); JSONToken token(); size_t depth(); size_t count(); bool is_integral(); int get_int(); int64_t get_int64_t(); uint64_t get_uint64_t(); float get_float(); double get_double(); std::string get_string(); std::string get_string_or_null(); bool get_boolean(); std::string get_key(); bool is_key(const char* key); bool is_key(const char* key, size_t len); ... }; where JSONToken is basically a std::string_view-like object into the raw JSON doc bytes, with low-level info for more control, about seeing a numeric sign, fractional point, or exponent, or about strings having escaped characters, including unicode ones, i.e. can't be used as-is, must be decoded according to JSON rules to get back UTF-8 text. The former parser "pushes" information at you, the client code. The parser does the looping. While in the latter, the client code is in the driver seat and does the loop, and controls the parser, extracting information out of it. There's also no inheritance necessary with a PULL parser, virtual or static-CRTP. As Bjorn wrote, a PULL parser is the lowest level building block, and the most convenient one to use. A PULL parser is typically passed around to code decoding various data structures, to instantiate them and their "children/descendants" from the infoset in the JSON doc. To make that safe from misbehaving code, I added concepts like "scopes" and "savepoints", so that the function you pass the reader to cannot step out of the current object, and to allow the caller code to recover by "rewinding" the doc to before the misbehaving reader, skip that object, and try the next one. Which means I also basically support incremental parsing too, even though I don't have an API for it, as obvious from above. Many parsers also have safeguards and "limits" in terms of depth of the stack, or maximum size allowed for strings, which are configured here via the JSONParserOptions struct. Anyways, I'm just showing this to illustrate differences between parsers. There are much better and faster parsers than mine. I learned a lot building them though, it was fun. Mine is comparable to nlohmann in terms of performance, i.e. not that fast :). --DD