On 9/23/19 5:16 PM, Phil Endecott via Boost wrote:
I am reminded of the various discussions of alternative styles of XML parsers that have happened on this list over the years. People have a surprising variety of often-conflicting requirements or preferences. I think it's unlikely that any one solution will suit everyone - but maybe there are common bits of functionality that can be shared?
As a former developer of one of said XML parsers, we learned the proper abstractions the hard way. If you start with a pull parser (what Vinnie refers to as an online parser, and what you refer to as an iterating parser), such as the XmlTextReader, then all the other interfaces flows naturally from that. Although the pull parser is mainly used as the basic building block for the other abstractions, it can also be used directly, e.g. for quick scanning of large JSON documents without memory allocation. A push parser (SAX) can easily be created by calling the pull parser in a loop and firing off events. Serialization is done by incrementally using a pull parser inside a serialization input archive, and likewise a a similar interface for generating the layout (e.g. XmlTextWriter) can be used for output archives. A tree parser (DOM) is simply a push parser that generates nodes as events are fired off. That is the design principles behind this JSON parser: http://breese.github.io/trial/protocol/
My preference has always been for parsing by memory-mapping the entire file, or equivalently reading the entire document into memory as a blob of text, and then providing iterators that advance through the text looking for the next element, attribute, character etc. I think one of the first XML parsers to work this way was RapidXML. Their aim was to
The Microsoft XML parser came first.