On Wed, Mar 31, 2021 at 10:11 PM Stefan Seefeld via Boost < boost@lists.boost.org> wrote:
allow me to jump into this discussion with some thoughts.
On 2021-03-10 2:16 p.m., VinÃcius dos Santos Oliveira via Boost wrote:
XML is an old, overengineered and hated format (and rightfully so), but industry adoption basically forces us to use it for interoperability with a few services to this day. So that's the value for XML here, interoperability with legacy software. It's not a value to be neglected.
I'll give a very similar advice I shared with FFT proposals: Please
consider not to re-implement a full XML library (which is quite a daunting task), but rather, focus on the C++ API as an *interface* that can be layered on top of existing XML libraries.
While normally I'd agree with you, by this train of thought, we wouldn't have Boost.JSON accepted in Boost right now.
The world already has way too many incomplete and buggy XML libraries.
True. But different people have different tradeofs. libxml2 and xerces and expat may be complete, and as close to bug free as it gets in C/C++ XML, but they are certainly not modern C++, often not incremental parsing, and certainly don't allow the kind of allocator support Boost.JSON introduced. Nor are they the fastest. So a non-wrapper Boost.JSON like Boost.XML would be very interesting. Perhaps even like Boost.JSON, and controversially, foregoing SAX and only do DOM. The main issue with XML are all the little things to get right, like character entities, entity includes inherited from DTDs, DTDs themselves, for validation and default values, whitespace normalization, namespace support, and related techs liks XSDs, XPath, XLink, XInclude, XQuery, etc... Proper PSVI (post schema validation infoset) is also often problematic, but that assumes a validating parser (via DTD or XSD) in the first place. There's definitely space to explore a Boost.JSON-like low-level modern parser building only a DOM with value semantic and allocator support, with a modern API. Much could be built on such a foundation, and that's an interesting GSOC project, even if it never "graduates". In any case, beside the 3 mentioned above, there's also rapidxml and pugixml, the latter still actively maintained. Perhaps they are not as complete, but they are definitely quite a bit faster than the "old" ones. --DD