PropertyTree's XML parsers survey

From: "Robert Ramey"
Subject: Re: [Boost-users] PropertyTree's XML parsers survey
The serialization library has been using the spirit library for parsing XML for years. This is for both utf-8 and wide characters
Robert Ramey
I just played with the examples demo_xml_load along with boost 1.39. As
I mainly use xml as configuration files. I found some features I really
like are missing:
- often I need to edit the xml file manually with 3rd tools(gedit, kate
notepad++ etc) other than the app itself. When saving back with those
editors, they may add some BOM bytes(0xFFFE for UTF16LE, 0xFEFF for
UTF16BE, 0xefbbbf for utf8) and replace the line-breakers(0x0a, 0x0a0d)
with another fashion(e.g 0x0d0a -> 0x0a) the system default. In such
cases, I wish boost.serialize would detect and ignore them as the real
data the xml carries do NOT change.
- I often embed some comment nodes inside the xml to facilitate the
manual edit. E.g, enumeration of accepted value some fields. I wish
boost.serialize ignores them when parsing, and embbded them when
saving(is this possible).
- I wish only the hierarchy not the order or nodes matters. Ex.
<a>

Tan, Tom (Shanghai) wrote:
From: "Robert Ramey"
Subject: Re: [Boost-users] PropertyTree's XML parsers survey The serialization library has been using the spirit library for parsing XML for years. This is for both utf-8 and wide characters
I mainly use xml as configuration files. I found some features I really like are missing:
Heh. Just a guess, but I'm guessing Robert will say he intentionally parses the smallest possible subset of XML needed to support the syntax he knows the XML archive writes. Instead of adding features to his parser, I bet he'll simply suggest you implement an alternative XML archive type based on an existing general-purpose XML parser library.
- I often embed some comment nodes inside the xml to facilitate the manual edit. E.g, enumeration of accepted value some fields. I wish boost.serialize ignores them when parsing,
That seems plausible to me...
and embed them when saving (is this possible).
This request feels more difficult. Boost.Serialize is designed to read from a file into /your/ data structures, and/or write those data structures to a file. XML is only one of several possible archive formats for such a file. Where in your own data structures would you recommend the Serialize library store any XML comments it may encounter? Supposing there were an appropriate place to collect such comments, how would you write them again in something like their original file location?
On the whole, my impression is boost.serialize imposes some restrictions (orders for instance) that make sense to plain text files, but limits the built-in flexibility that coming with xml.
Instead of incrementally enhancing a minimal XML parser, wouldn't it make more sense to jump right to an existing full-blown parser implementation?

Nat Goodspeed wrote:
Instead of incrementally enhancing a minimal XML parser, wouldn't it make more sense to jump right to an existing full-blown parser implementation?
In the case of Serialization, that adds a Boost-external dependency. In the case of PropertyTree, this adds a compile-time dependency to a header-only library. Unless that parser is pugxml, but I don't really trust it. Sebastian

Sebastian Redl wrote:
Nat Goodspeed wrote:
Instead of incrementally enhancing a minimal XML parser, wouldn't it make more sense to jump right to an existing full-blown parser implementation?
In the case of Serialization, that adds a Boost-external dependency. In the case of PropertyTree, this adds a compile-time dependency to a header-only library. Unless that parser is pugxml, but I don't really trust it.
Sorry for being unclear. I was suggesting that the OP use a full-blown XML parser for his own purposes -- comments and arbitrary order and such. I was not suggesting that anyone replace the Serialization library's minimal XML parser for all other users.
participants (3)
-
Nat Goodspeed
-
Sebastian Redl
-
Tan, Tom (Shanghai)