Re: [boost] Idea Suggestion for GsOC'21

1 Apr 2021

      On Wed, Mar 31, 2021 at 10:11 PM Stefan Seefeld via Boost <
boost@lists.boost.org> wrote:
...
allow me to jump into this discussion with some thoughts.
On 2021-03-10 2:16 p.m., Vinícius dos Santos Oliveira via Boost wrote:
...
XML is an old, overengineered and hated format (and rightfully so),
but industry adoption basically forces us to use it for
interoperability with a few services to this day. So that's the value
for XML here, interoperability with legacy software. It's not a value
to be neglected.
I'll give a very similar advice I shared with FFT proposals: Please
...
consider not to re-implement a full XML library (which is quite a
daunting task), but rather, focus on the C++ API as an *interface* that
can be layered on top of existing XML libraries.
While normally I'd agree with you, by this train of thought,
we wouldn't have Boost.JSON accepted in Boost right now.
...
The world already has way too many incomplete and buggy XML libraries.
True. But different people have different tradeofs. libxml2 and xerces and
expat
may be complete, and as close to bug free as it gets in C/C++ XML, but they
are
certainly not modern C++, often not incremental parsing, and certainly
don't allow
the kind of allocator support Boost.JSON introduced. Nor are they the
fastest.

So a non-wrapper Boost.JSON like Boost.XML would be very interesting.
Perhaps even like Boost.JSON, and controversially, foregoing SAX and only
do DOM.

The main issue with XML are all the little things to get right, like
character entities,
entity includes inherited from DTDs, DTDs themselves, for validation and
default values,
whitespace normalization, namespace support, and related techs liks XSDs,
XPath,
XLink, XInclude, XQuery, etc... Proper PSVI (post schema validation
infoset) is also
often problematic, but that assumes a validating parser (via DTD or XSD) in
the first place.

There's definitely space to explore a Boost.JSON-like low-level modern
parser building
only a DOM with value semantic and allocator support, with a modern API.
Much could
be built on such a foundation, and that's an interesting GSOC project, even
if it never "graduates".

In any case, beside the 3 mentioned above, there's also rapidxml and
pugixml,
the latter still actively maintained. Perhaps they are not as complete, but
they
are definitely quite a bit faster than the "old" ones. --DD

Re: [boost] Idea Suggestion for GsOC'21

Dominique Devienne