Re: [boost] [ANN] libstudxml - modern XML API for C++

21 May 2014

      On 2014-05-21 13:57, Boris Kolpackov wrote:
...
Hi Stefan,
In gmane.comp.lib.boost.devel you write:
...
Does it support a DOM-like API, i.e. an in-memory representation of the
document ?
No, it does not. I spent quite a bit of time on the in-memory vs
streaming debate in my talk. How I wish the video was already
available...
Let me know when it is, I'm looking forward to hear your arguments. :-)
...
Until then, to summarize the key points:
* Most people think they need DOM. I believe it is not because in-memory
  is conceptually better but because of the really awful and inconvenient
  streaming APIs (like SAX). So I tried to convince the audience that a
  well designed streaming pull API is actually sufficient for the majority
  of cases. I didn't hear many objections.
Take a look at the API Introduction[1], it shows how to handle everything
  from converters/filters that don't care about the data, to applications
  that process the data without creating any kind of in-memory object
  model, to C++ classes that know how to persist themselves in XML.
* On that last point (C++ class persistence) a lot of applications
  extract XML data into some kind of object model (C++ classes that
  correspond to the XML vocabulary). Creating an intermediate
  representation of XML (DOM) just to throw it way moments later
  seems kind of pointless.
* Of course there will always be applications that need to revisit
  the bulk of raw XML data and for them in-memory would probably
  always be a better choice.
Right. I can agree with you that a good API over SAX (or reader) could
be better than DOM in certain cases, but not all. Just think of someone
wanting to write an XML editor (e.g., to edit XHTML or DocBook
documents), with support for standard XML features such as xinclude,
xpath-based search, perhaps even xslt-based transformations.

Again, I'm definitely not suggesting everyone needs those features, but
there has to be a place where these can be added in boost.xml.
...
* Which brings us to this point: it is easy to go from streaming to
  in-memory but not the other way around.
Yes, of course, a DOM API can be implemented on top of a streaming API.
But you are pushing down the road of yet another implementation of XML,
which I strongly object to. I'm not against anyone re-implementing an
XML library. But as I said, I don't think Boost.XML should mandate a new
implementation with so many existing choices. There just is no point in
such an exercise, other than self-education.
...
* In fact, an even better approach would be to support hybrid, partially
  streaming/partially in-memory parsing and serialization (also discussed
  in the talk). Then, the fully in-memory would simply be a special case.
* libstudxml has the ‘hybrid’ example which shows how to implement this
  hybrid approach. You would be shocked how short and simple the code
  is (I know I was once I wrote it ;-)).
[1] http://www.codesynthesis.com/projects/libstudxml/doc/intro.xhtml#2
Again, I'm resisting to get dragged into a discussion about
implementation A vs. implementation B. I don't want to argue about that.
I'm arguing for a Boost.XML API that supports multiple choices of
backends. This is mostly a maintainability question. XML is a complex
standard, with occasional updates and new feature additions. Just adding
a few new wrappers around existing implementations is far easier than
having to re-implement things just because of a bad design decision when
Boost.XML first came into being...
...
...
I have always strongly argued against the idea that an "XML API" was
only about parsing XML data, as there are many useful features that
involve manipulation of XML data (including transformations between
documents, xpath-based search, etc.).
You need to start somewhere. And support for (relatively) low-level XML
parsing and serialization seems like a good place.
...
...
In fact, I believe such an API should be robust enough to be able to
wrap different backends, rather than depending on a particular
implementation choice.
I don't think it will be robust. I think it will be awful and inconvenient.
Try to adapt straight SAX API to anything other than callback-based with
inversion of control (i.e., SAX again).
Have you looked at existing XML libraries before you started libstudxml
? Did you know about Boost.XML, or Arabica ?
Anyhow, I'm not trying to convince you that you should change anything.
I'm trying to show you how a thin wrapper can look like.

Stefan

-- 

      ...ich hab' noch einen Koffer in Berlin...

Re: [boost] [ANN] libstudxml - modern XML API for C++

Stefan Seefeld