Thanks for the quick response. BOM is Windows specific. In my opinion, BOM is not really related to how you encode the text(thus not related to uft_codecvt_facet), but how you mark what your encoding is; so that any text editor can get a prompt of how to handle the text. It's implemented by inserting a few bytes to the very beginning of the file, which are never used in the chosen encoding of the following code. In the case of UTF-8, "EF BB BF" are used -- in the encoding table of UTF-8, "EF BB BF" should correspond to no character(I did not check, just out of guess). As it's related to general text files, not specific to xml files. basic_text_iarchive might be a better place to address the issue. I am thinking just detecting " EF BB BF " and discarding them if they exist would solve the issue. But I am not sure which method need to be overriden, can you please advise? Thanks, tom
This is news to me.
the wide character text/xml archives use UTF-8. They do this by creating a stream with the uft_codecvt_facet. I used this factet, it worked great and I moved on. So you're way ahead of me on this.
This would probably be easy to address in the xml_iarchive code or perhaps the xml_grammar - but, as I said, I don't know anything about it.
Robert Ramey
Tan, Tom (Shanghai) wrote:
what is BOM?
Probably "Byte Order Mark", see http://en.wikipedia.org/wiki/Byte-order_mark
Yes, That's what I meant.
I was testing the demo_xml_load.cpp and demo_xml_save.cpp available in the boost.serialization example. By simply opening demo_save.xml produced by demo_xml_save.exe with XML copy editor(http://xml-copy-editor.sourceforge.net/) and saving it back, demo_xml_load.exe would crash. I compared the two files with Winmerge. It said it's identical.
by studying the hex view, I later found it's because the 3-byte UTF-8 BOM was inserted to the beginning of file. It would not change the data, and in many cases was ignored by the text editors.
I thinking that Boost.serialization should also handle this for all text files including XML.
Tom
------------------------------ _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users End of Boost-users Digest, Vol 1744, Issue 1 ********************************************