An invalid XML character (Unicode: 0x8) problem because of property_tree::xml_parser::write_xml

newer
boost.ublas gsoc 2015

Rohan Shetty

2 Mar 2015 2 Mar '15

4:59 a.m.

Hi, I have used the following C++ code to generate the xml boost::property_tree::ptree ptResponse; // Populate the tree from the Microsoft Outlook contactsstd::stringstream buf; const std::string enc("utf-8"); boost::property_tree::xml_writer_settings<char> settings(' ', 0, enc); boost::property_tree::xml_parser::write_xml(buf, ptResponse, settings); This works fine. But in one of the customer's machine, when reading the this(xml content) in a JAVA program. I get the following error An invalid XML character (Unicode: 0x8) was found in the element content of the document. Any help in solving this is appreciated. Regards,Rohan

Show replies by date

Mathias Gaunard

2 Mar 2 Mar

4:40 p.m.

New subject: An invalid XML character (Unicode: 0x8) problem because of property_tree::xml_parser::write_xml

On 02/03/2015 05:59, Rohan Shetty wrote:

...

Hi, I have used the following C++ code to generate the xml boost::property_tree::ptree ptResponse; // Populate the tree from the Microsoft Outlook contactsstd::stringstream buf; const std::string enc("utf-8"); boost::property_tree::xml_writer_settings<char> settings(' ', 0, enc); boost::property_tree::xml_parser::write_xml(buf, ptResponse, settings); This works fine. But in one of the customer's machine, when reading the this(xml content) in a JAVA program. I get the following error An invalid XML character (Unicode: 0x8) was found in the element content of the document.

Any help in solving this is appreciated.

Rohan Shetty

3 Mar 3 Mar

3:11 a.m.

New subject: An invalid XML character (Unicode: 0x8) problem because of property_tree::xml_parser::write_xml

Hi Mathias, Thanks for your response. I was expecting write_xml(with "utf-8") to do the escape(e.g < replaced with <) or strip any invalid characters(e.g. anything other than #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]) Is this part of the write_xml()? Do let me know if this is not clear. Regards,Rohan From: Mathias Gaunard <mathias.gaunard@ens-lyon.org> To: boost@lists.boost.org Sent: Monday, March 2, 2015 10:10 PM Subject: Re: [boost] An invalid XML character (Unicode: 0x8) problem because of property_tree::xml_parser::write_xml On 02/03/2015 05:59, Rohan Shetty wrote:

...

Hi, I have used the following C++ code to generate the xml boost::property_tree::ptree ptResponse; // Populate the tree from the Microsoft Outlook contactsstd::stringstream buf; const std::string enc("utf-8"); boost::property_tree::xml_writer_settings<char> settings(' ', 0, enc); boost::property_tree::xml_parser::write_xml(buf, ptResponse, settings); This works fine. But in one of the customer's machine, when reading the this(xml content) in a JAVA program. I get the following error An invalid XML character (Unicode: 0x8) was found in the element content of the document.

Any help in solving this is appreciated.

I don't understand, the error message is quite explicit: your data isn't utf-8 even though you said it was. What were you expecting to happen? Also this would probably be more suited to the boost-users mailing list. _______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Mathias Gaunard

11:58 a.m.

New subject: An invalid XML character (Unicode: 0x8) problem because of property_tree::xml_parser::write_xml

...

Hi Mathias, Thanks for your response. I was expecting write_xml(with "utf-8") to do the escape(e.g < replaced with <) or strip any invalid characters(e.g. anything other

...

Is this part of the write_xml()? Do let me know if this is not clear. Regards,Rohan

It is not reasonable to expect that the write_xml function would silently drop data by default. If you want invalid data to be removed, you'll have to do this yourself prior to calling the function. This signature of write_xml doesn't actually do anything encoding-wise, it outputs your data as-is, and marks the data as being the encoding you specified. It might be more sensible to set up the encoding correctly though, or to convert your data to the right encoding. There is another overload of write_xml that can imbue a locale when writing the data, which can be used for transparent transcoding.

Rohan Shetty

4 Mar 4 Mar

3:06 a.m.

New subject: An invalid XML character (Unicode: 0x8) problem because of property_tree::xml_parser::write_xml

On 03/03/2015 5:28 PM, Mathias Gaunard wrote:> This mailing-list uses bottom- and inline-posting, please lay out your > responses accordingly.

...

It is not reasonable to expect that the write_xml function would > silently drop data by default.> If you want invalid data to be removed, you'll have to do this yourself > prior to calling the function. This signature of write_xml doesn't actually do anything encoding-wise, > it outputs your data as-is, and marks the data as being the encoding you > specified. It might be more sensible to set up the encoding correctly though, or to > convert your data to the right encoding.> There is another overload of write_xml that can imbue a locale when > writing the data, which can be used for transparent transcoding. Thanks Mathias.

Bjorn Reese

3 Mar 3 Mar

12:18 p.m.

On 03/03/2015 04:11 AM, Rohan Shetty wrote:

...

I was expecting write_xml(with "utf-8") to do the escape(e.g < replaced with <) or strip any invalid characters(e.g. anything other than #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]) Is this part of the write_xml()?

Please read the documentation: "RapidXML does not fully support the XML standard; it is not capable of parsing DTDs and therefore cannot do full entity substitution. [...] Please note that RapidXML does not understand the encoding specification. If you pass it a character buffer, it assumes the data is already correctly encoded; if you pass it a filename, it will read the file using the character conversion of the locale you give it (or the global locale if you give it none). This means that, in order to parse a UTF-8-encoded XML file into a wptree, you have to supply an alternate locale, either directly or by replacing the global one." http://www.boost.org/doc/html/boost_propertytree/parsers.html

Rohan Shetty

4 Mar 4 Mar

3:08 a.m.

On 03/03/2015 5:48 PM, Bjorn Reese wrote:> Please read the documentation:> > "RapidXML does not fully support the XML standard; it is not capable> of parsing DTDs and therefore cannot do full entity substitution.> > [...]> > Please note that RapidXML does not understand the encoding> specification. If you pass it a character buffer, it assumes the data> is already correctly encoded; if you pass it a filename, it will read> the file using the character conversion of the locale you give it (or> the global locale if you give it none). This means that, in order to> parse a UTF-8-encoded XML file into a wptree, you have to supply an> alternate locale, either directly or by replacing the global one."> > http://www.boost.org/doc/html/boost_propertytree/parsers.html Thanks Bjorn.

3786

Age (days ago)

3788

Last active (days ago)

List overview

Download

6 comments

3 participants

participants (3)

Bjorn Reese
Mathias Gaunard
Rohan Shetty