[Serialization] XML: float format is scientific instead of human-readable since Boost 1.57

newer
Re: [Boost-users] [heap] Mix use...

Frank Stähr

28 Jan 2015 28 Jan '15

1:18 p.m.

Hello Boost guys, when I switched from Boost 1.55 to 1.57 there was a change in the formation of floats and doubles for xml archives. Look at this example: #include <fstream> #include <boost/archive/xml_oarchive.hpp> class MyClass { friend class boost::serialization::access; template<class Archive> void serialize(Archive & ar, const unsigned int version) { ar & boost::serialization::make_nvp("accuracy", accuracy); } double accuracy = 0.03; }; int main() { MyClass obj; std::ofstream ofs("sample.xml"); boost::archive::xml_oarchive oa(ofs); oa & boost::serialization::make_nvp("myclass", obj); } I compiled it with VS 2013 for 32 bit. The output file has the line: <m_accuracy>2.99999999999999990e-002</m_accuracy> In Boost 1.55 it was much better human-readable: 0.03. Is this behaviour (this change) intended? In any case, what is the best way to format floats and doubles? (i. e. a manipulator for the ofstream?) Thanks, Frank -- Frank Stähr Technische Universität Berlin Communication Systems Group Sekr. EN1 Einsteinufer 17 10587 Berlin, Germany

Show replies by date

Robert Ramey

28 Jan 28 Jan

3:16 p.m.

Frank Stähr wrote

...

I compiled it with VS 2013 for 32 bit. The output file has the line: <m_accuracy> 2.99999999999999990e-002 </m_accuracy> In Boost 1.55 it was much better human-readable: 0.03.

Is this behaviour (this change) intended? In any case, what is the best way to format floats and doubles? (i. e. a manipulator for the of stream?)

This is surely an unintended side effect of changes to guarantee correct "round tripping" of floating point numbers. That is that the the floating point number read in is bit for bit the same as the floating point number written out. This was seen as good thing. I'm not sure if this creates some unintended problems. Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/Serialization-XML-float-format-is-scienti... Sent from the Boost - Users mailing list archive at Nabble.com.

Paul A. Bristow

4:55 p.m.

New subject: [Serialization] XML: float format is scientific instead of human-readable since Boost 1.57

...

-----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: 28 January 2015 15:17 To: boost-users@lists.boost.org Subject: Re: [Boost-users] [Serialization] XML: float format is scientific instead of human-readable since Boost 1.57

Frank Stähr wrote

...
I compiled it with VS 2013 for 32 bit. The output file has the line: <m_accuracy> 2.99999999999999990e-002 </m_accuracy> In Boost 1.55 it was much better human-readable: 0.03.

Is this behaviour (this change) intended? In any case, what is the best way to format floats and doubles? (i. e. a manipulator for the of stream?)

This is surely an unintended side effect of changes to guarantee correct "round tripping" of floating point numbers. That is that the the floating point number read in is bit for bit the same as the floating point number written out.

This was seen as good thing.

Short answer: If you want to be sure that what you get back from de-serialization is bit-for-bit identical to what you serialized, then it is definitely a Good Thing. For some MS compilers at least, it is also necessary to use scientific format for this to avoid some very intermittent failures to 'round-trip' correctly. (This was discovered by a real-time user - and was only pinned down by random testing!) Long Answer: For more than you will want to know see: Exploring Binary <exploringbinary@gmail.com> And from a previous post:

...

But the other thing is that by setting precision to 17 lexical_cast is bloating the string representations of the doubles with lots of 9s in both Visual Studio 2010 and Visual Studio 2013. Setting precision to 15 instead prevents this, and makes the original test run faster even with Visual Studio 2013 (about 4 seconds rather than 10).

In order to be sure of 'round-tripping' one needs to output std::numeric_limits<FPT>::max_digits10 decimal digits. max_digits10 is 17 for double enough to ensure that all *possibly* significant digits are used. digits10 is 15 for double and using this will work for *your* example, but will fail to 'round-trip' exactly for *some* values of double. The reason for a rewrite *might* be that for VS <=11, there was a slight 'feature' ('feature' according to Microsoft, 'bug' according to many, though the C++ Standard does NOT require round-tripping to be exact. Recent GCC and Clang achieve exact round-tripping.) // The original value causing trouble using serialization was 0.00019075645054089487; // wrote 0.0019075645054089487 // read 0.0019075645054089489 // a increase of just 1 bit. // Although this test uses a std::stringstream, it is possible that // the same behaviour will be found with ALL streams, including cout and cin? // The wrong inputs are only found in a very narrow range of values: // approximately 0.0001 to 0.004, with exponent values of 3f2 to 3f6 // and probably every third value of significand (tested using nextafter). However, a re-test reveals that this 'feature' is still present using VS2013 (version 12.0). (This tests uses random double values to find round-trip or loopback failures). So the price of accuracy is lots of digits (and time to output and re-digest them) :-( Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830

Frank Stähr

5:07 p.m.

Am 28.01.2015 um 16:16 schrieb Robert Ramey:

...

Frank Stähr wrote

...
I compiled it with VS 2013 for 32 bit. The output file has the line: <m_accuracy> 2.99999999999999990e-002 </m_accuracy> In Boost 1.55 it was much better human-readable: 0.03.

This is surely an unintended side effect of changes to guarantee correct "round tripping" of floating point numbers. That is that the the floating point number read in is bit for bit the same as the floating point number written out.

Yes, I read about "round tripping" in some mail archives, thanks for clarification. In my opinion it would be more sensible to use the format options from the ofstream. This would suit much more elegant to the std … but anyway, I am not familiar with round tripping …

...

This was seen as good thing. I'm not sure if this creates some unintended problems.

Not really, this just makes it harder to read or edit XMLs now. Is it possible for us to write the “old” XML format? Frank

Paul A. Bristow

29 Jan 29 Jan

10:30 a.m.

New subject: [Serialization] XML: float format is scientific instead of human-readable since Boost 1.57

...

-----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Frank Stähr Sent: 28 January 2015 17:07 To: boost-users@lists.boost.org Subject: Re: [Boost-users] [Serialization] XML: float format is scientific instead of human-readable since Boost 1.57

Am 28.01.2015 um 16:16 schrieb Robert Ramey:

...
Frank Stähr wrote

...
I compiled it with VS 2013 for 32 bit. The output file has the line: <m_accuracy> 2.99999999999999990e-002 </m_accuracy> In Boost 1.55 it was much better human-readable: 0.03.

This is surely an unintended side effect of changes to guarantee correct "round tripping" of floating point numbers. That is that the the floating point number read in is bit for bit the same as the floating point number written out.

Yes, I read about "round tripping" in some mail archives, thanks for clarification. In my opinion it would be more sensible to use the format options from the ofstream. This would suit much more elegant to the std … but anyway, I am not familiar with round tripping …

As long as you don't mind the risk (small but not zero) that some values in your archive will be slightly different after de-serialization. I'd worry about that personally. Most would assume that it won't happen? Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830

Thorsten Ottosen

1:03 p.m.

On 29-01-2015 11:30, Paul A. Bristow wrote:

...

...
-----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Frank Stähr

...

...
Yes, I read about "round tripping" in some mail archives, thanks for clarification. In my opinion it would be more sensible to use the format options from the ofstream. This would suit much more elegant to the std … but anyway, I am not familiar with round tripping …

As long as you don't mind the risk (small but not zero) that some values in your archive will be slightly different after de-serialization.

I'd worry about that personally. Most would assume that it won't happen?

Yes, we really don't want to loose information. It's so hard to track down errors originating from these kind of silent changes. -Thorsten

Frank Stähr

30 Jan 30 Jan

9:04 a.m.

Here a reminder to my most important question: On 28.01.2015 at 18:07, Frank Stähr wrote:

...

Not really, this just makes it harder to read or edit XMLs now. Is it possible for us to write the “old” XML format?

So I guess no?

Clark Cianfarini

2:45 p.m.

On Fri, Jan 30, 2015 at 4:04 AM, Frank Stähr <staehr@nue.tu-berlin.de> wrote:

...

Here a reminder to my most important question:

On 28.01.2015 at 18:07, Frank Stähr wrote:

...
Not really, this just makes it harder to read or edit XMLs now. Is it possible for us to write the “old” XML format?

So I guess no?

The way our company chose to create human-readable XML is to serialize strings instead of doubles or floats. We used a stringstream with std::setprecision to get a certain number of decimal places. The number has a chance of being slightly different after de-serialization and pulling it out of the stringstream, just like the old serialization method. This is okay for us, as we need human-readable XML more than exact round tripping. Hope this helps. _______________________________________________

...

Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users

Paul A. Bristow

2:56 p.m.

New subject: [Serialization] XML: float format is scientific instead of human-readable since Boost 1.57

From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Clark Cianfarini Sent: 30 January 2015 14:45 To: boost-users@lists.boost.org Subject: Re: [Boost-users] [Serialization] XML: float format is scientific instead of human-readable since Boost 1.57 On Fri, Jan 30, 2015 at 4:04 AM, Frank Stähr <staehr@nue.tu-berlin.de> wrote: Here a reminder to my most important question: On 28.01.2015 at 18:07, Frank Stähr wrote: Not really, this just makes it harder to read or edit XMLs now. Is it possible for us to write the “old” XML format? So I guess no? The way our company chose to create human-readable XML is to serialize strings instead of doubles or floats. We used a stringstream with std::setprecision to get a certain number of decimal places. The number has a chance of being slightly different after de-serialization and pulling it out of the stringstream, just like the old serialization method. This is okay for us, as we need human-readable XML more than exact round tripping. Frank Stahr might also consider using 32-bit floats instead of double - if the precision of 6 decimal digits is enough. This could much reduce the max number of decimal digits from 17 to 9 for 32-bit float max_digits10 = 9 and the guaranteed digits10 is 6 for 64-bit double max_digits10 = 15 and the guaranteed digits10 is 17 HTH too. Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830

Robert Ramey

4:27 p.m.

Frank Stähr wrote

...

Here a reminder to my most important question:

On 28.01.2015 at 18:07, Frank Stähr wrote:

...
Not really, this just makes it harder to read or edit XMLs now. Is it possible for us to write the “old” XML format?

So I guess no?

Not necessarily. The archive classes have provision for options to be set upon opening. One way would be too add a new option - no_roundtripping or whatever. Of course that means tweaking the implementation of the library. These days I only do this in response to a cogently formulated request entered into the trac system. The problem is that these things always to turn out to have unintended consequences. Since we test serialization pretty thoroughly (though not as thoroughly as I would like), this always ends up as more work than anticipated so I tend to drag my feet in these matters. Speaking from memory, I believe that all the text base archives use set_precision on the underlying stream (and reset back to what it was upon leaving). So maybe the best thing would be to create and option - no_stream_modication which would suppress this behavior (speaking from memory, there might even be such and option specified already!). So if it's really important to you and you want to do a little work, feel free to delve into this aspect of the library and propose an enhancement/modification. If you're really confident you can fork the serialization library on github, test out your updates (don't forget update to the documentation!!!!). Then you can issue a pull request and start bugging me to include it. I've generally (though not always) been willing to merge these into the library. So I'm willing to consider accommodating this request. The real question is: is it sufficiently important for you to do the work or only important enough for me to do the work? Robert Ramey -- View this message in context: http://boost.2283326.n4.nabble.com/Serialization-XML-float-format-is-scienti... Sent from the Boost - Users mailing list archive at Nabble.com.

Frank Stähr

9 Feb 9 Feb

9:33 a.m.

On 30.01.2015 at 17:27, Robert Ramey wrote:

...

So I'm willing to consider accommodating this request. The real question is: is it sufficiently important for you to do the work or only important enough for me to do the work?

No.

3809

Age (days ago)

3821

Last active (days ago)

List overview

Download

1 comments

2 participants

participants (2)

Clark Cianfarini
Frank Stähr
Paul A. Bristow
Robert Ramey
Thorsten Ottosen