On 2014-05-21 13:57, Boris Kolpackov wrote:
Hi Stefan,
In gmane.comp.lib.boost.devel you write:
Does it support a DOM-like API, i.e. an in-memory representation of the document ? No, it does not. I spent quite a bit of time on the in-memory vs streaming debate in my talk. How I wish the video was already available...
Let me know when it is, I'm looking forward to hear your arguments. :-)
Until then, to summarize the key points:
* Most people think they need DOM. I believe it is not because in-memory is conceptually better but because of the really awful and inconvenient streaming APIs (like SAX). So I tried to convince the audience that a well designed streaming pull API is actually sufficient for the majority of cases. I didn't hear many objections.
Take a look at the API Introduction[1], it shows how to handle everything from converters/filters that don't care about the data, to applications that process the data without creating any kind of in-memory object model, to C++ classes that know how to persist themselves in XML.
* On that last point (C++ class persistence) a lot of applications extract XML data into some kind of object model (C++ classes that correspond to the XML vocabulary). Creating an intermediate representation of XML (DOM) just to throw it way moments later seems kind of pointless.
* Of course there will always be applications that need to revisit the bulk of raw XML data and for them in-memory would probably always be a better choice.
Right. I can agree with you that a good API over SAX (or reader) could be better than DOM in certain cases, but not all. Just think of someone wanting to write an XML editor (e.g., to edit XHTML or DocBook documents), with support for standard XML features such as xinclude, xpath-based search, perhaps even xslt-based transformations. Again, I'm definitely not suggesting everyone needs those features, but there has to be a place where these can be added in boost.xml.
* Which brings us to this point: it is easy to go from streaming to in-memory but not the other way around.
Yes, of course, a DOM API can be implemented on top of a streaming API. But you are pushing down the road of yet another implementation of XML, which I strongly object to. I'm not against anyone re-implementing an XML library. But as I said, I don't think Boost.XML should mandate a new implementation with so many existing choices. There just is no point in such an exercise, other than self-education.
* In fact, an even better approach would be to support hybrid, partially streaming/partially in-memory parsing and serialization (also discussed in the talk). Then, the fully in-memory would simply be a special case.
* libstudxml has the ‘hybrid’ example which shows how to implement this hybrid approach. You would be shocked how short and simple the code is (I know I was once I wrote it ;-)).
[1] http://www.codesynthesis.com/projects/libstudxml/doc/intro.xhtml#2
Again, I'm resisting to get dragged into a discussion about implementation A vs. implementation B. I don't want to argue about that. I'm arguing for a Boost.XML API that supports multiple choices of backends. This is mostly a maintainability question. XML is a complex standard, with occasional updates and new feature additions. Just adding a few new wrappers around existing implementations is far easier than having to re-implement things just because of a bad design decision when Boost.XML first came into being...
I have always strongly argued against the idea that an "XML API" was only about parsing XML data, as there are many useful features that involve manipulation of XML data (including transformations between documents, xpath-based search, etc.). You need to start somewhere. And support for (relatively) low-level XML parsing and serialization seems like a good place.
In fact, I believe such an API should be robust enough to be able to wrap different backends, rather than depending on a particular implementation choice. I don't think it will be robust. I think it will be awful and inconvenient. Try to adapt straight SAX API to anything other than callback-based with inversion of control (i.e., SAX again).
Have you looked at existing XML libraries before you started libstudxml ? Did you know about Boost.XML, or Arabica ? Anyhow, I'm not trying to convince you that you should change anything. I'm trying to show you how a thin wrapper can look like. Stefan -- ...ich hab' noch einen Koffer in Berlin...