Re: [boost] Standard c++ XML parser API (Boost.XML)

20 Mar 2014

      On 03/20/2014 04:34 AM, Bjorn Reese wrote:
...
On 03/18/2014 04:46 PM, Stefan Seefeld wrote:
...
I don't see any reason why such an XML API wouldn't be usable by other
Boost libraries.
It should be part of the GSoC project to verify this for the most
common use cases (XML serialization is the most obvious one.)
I don't entirely understand your point. The goal is to define an XML
API, and implement it, which complies with all related standards. As
long as the existing Boost components (e.g. Boost.Serialization) work
with standard XML tools, we should be compatible.
I don't think, however, that we should be constrained to be
API-compatible with existing tools, as otherwise the whole exercise to
define a new API would be pointless. On the other hand, making minor
adjustments to those libraries to work with Boost.XML would be fine. I
just don't think we should make this part of the proposal, as it isn't
even clear what existing Boost components would be affected, whether
they are actively maintained / developed, etc.
...
...
...
What is the purpose of the S template argument?
To keep the concern for unicode or any other string type orthogonal from
the XML library, i.e. to allow Boost.XML to interact with different
Unicode implementations. In fact,  in the existing demos I'm restricting
content to ASCII, so I can in fact get away with using std::string, so
this is a good example of the "modularity" design goal I mentioned
above: Don't force anything on users they don't actually need.
I agree with the goal, but I am not sure that the S type solves the
problem. I must admit that I am having difficulty understanding exactly
how you envision it should work for other encodings, because std::string
is orthogonal to encoding (locale is usually attached to the I/O
stream.)
You are right, encoding and string type are (mostly) orthogonal. I have
never said anything else. :-)
...
What encoding is used for std::string? ASCII, UTF-8, or "whatever the
XML library gives me"? This should be documented as part of the API
regardless of the answer.
Yes.
...
Should I define a new string type if I want to use Latin-1 or another
encoding in my application? What if the rest of my application uses
std::string for Latin-1 encodings? (I am wondering how will work with
the current convert trait specialization for std::string.)
How does the convert trait know the XML document encoding so that it
is able to convert between this and the application encoding?
I suggest that you adopt the libxml2 design decision to always use
UTF-8 for std::string (and UTF-16 for std::wstring if needed.) See
the design rationale here:
http://xmlsoft.org/encoding.html
Any backend that does not provide UTF-8 will have to be wrapped.
With such a design decision, the S template parameter becomes
superfluous (or should be changed to CharT if you wish to support
both std::string and std::wstring.)
Conversion between UTF-8 and application encodings would have to
be done explicitly in the application.
At any rate, encoding should be addressed in the GSoC project.
I agree, and this is in fact part of the proposal. To be specific, one
of the first steps is to add tests that instantiate the XML classes with
existing unicode string classes (such as glib::ustring or Qt's QString),
and demonstrate how to use them.
...
...
...
What is the purpose of the convert trait?
To allow conversion between the backend's own string representation and
the string type that is used with Boost.XML.
Ok. You should, however, make sure that the strings are converted
correctly:
http://xmlsoft.org/html/libxml-xmlstring.html
For instance, convert::in() does not take libxml2 custom allocators into
account:
http://xmlsoft.org/html/libxml-xmlmemory.html
Good point. As I said, the existing Boost.XML was meant to be a
proof-of-concept.

Thanks for your feedback,

        Stefan

-- 

      ...ich hab' noch einen Koffer in Berlin...