[serialization] How are floating point values handled?
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
I can't find anything in the docs about how floating point types are handled by the serialization lib. It seems like text archives print however many digits are specified in the ostream's precision is that correct? If so I think this needs to be raised as a big red flag, because it means that floating point values, uniquely compared to other primitives, do not round trip by default. Except actually they do when using a binary archive. This actually raises a problem when using text archives with floating point data - if you want to be able to round trip the values, then how many digits precision should you set the stream to? Particularly if you're saving a complex structure containing different floating point types of differing precisions? I would have thought it would be better for the serialization lib to set the stream precision before outputting a primitive type - to std::numeric_limits<T>::max_digits10 (or std::numeric_limits<T>::digits10+2 if max_digits10 is not available). However, I recognize that this is a difficult issue! Cheers, John.
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
This actually raises a problem when using text archives with floating point data - if you want to be able to round trip the values, then how many digits precision should you set the stream to? Particularly if you're saving a complex structure containing different floating point types of differing precisions?
Let me rephrase this somewhat: if I'm the author of a complex data structure that contains floating point data and I make it serializable, I have no way to ensure serialization proceeds correctly: instead that burden falls on the person creating the archive. For sure in small projects these will be the same person, but clearly not when "library writing". And yes, I realize that many compilers don't round trip floating point values even when you print "enough" digits, but at least they get close. The current default situation is to print just 5 digits, and that certainly doesn't get close! But, rant over for now ;-) Cheers, John.
data:image/s3,"s3://crabby-images/3e82c/3e82ccc202ec258b0b6ee3d319246dddb1f0ae3c" alt=""
John Maddock wrote:
The current default situation is to print just 5 digits, and that certainly doesn't get close!
Hmmm - that surprises me. I look at the code in the file basic_text_oprimitive.hpp and I find: void save(const float t) { // must be a user mistake - can't serialize un-initialized data if(os.fail()) boost::serialization::throw_exception( archive_exception(archive_exception::output_stream_error) ); os << std::setprecision(std::numeric_limits<float>::digits10 + 2); os << t; } Is this code not correct? Note that I added in 2 extra digits. Or is this code not getting invoked? Let me know. Robert Ramey
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
The current default situation is to print just 5 digits, and that certainly doesn't get close!
Hmmm - that surprises me. I look at the code in the file basic_text_oprimitive.hpp and I find:
void save(const float t) { // must be a user mistake - can't serialize un-initialized data if(os.fail()) boost::serialization::throw_exception( archive_exception(archive_exception::output_stream_error) ); os << std::setprecision(std::numeric_limits<float>::digits10 + 2); os << t; }
Is this code not correct? Note that I added in 2 extra digits. Or is this code not getting invoked? Let me know.
Ah... my problem was with a UDT that was marked as a primitive: then it just calls the << operator and doesn't attempt to set the stream precision as far as I can tell? The issue would presumably also surface if someone tried to non-intrusively support non-standard native floating point types such as GCC's __float128 or Intel's _Quad data types, you can write a "serialize" function instead, but as you already pointed out, that involves more typing once you you split the method into load/save and binary/text variants. John.
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
Hmmm - that surprises me. I look at the code in the file basic_text_oprimitive.hpp and I find:
void save(const float t) { // must be a user mistake - can't serialize un-initialized data if(os.fail()) boost::serialization::throw_exception( archive_exception(archive_exception::output_stream_error) ); os << std::setprecision(std::numeric_limits<float>::digits10 + 2); os << t; }
Is this code not correct? Note that I added in 2 extra digits. Or is this code not getting invoked? Let me know.
Ah, I think there are still 2 bugs: * There's no special handling for long double - so it gets the default 5 digits. * You don't output in scientific format: what that means in practice is that if you write a number with a small exponent and it gets printed in fixed format, then you get std::numeric_limits<float>::digits10 + 2 digits *after the decimal point*. That may be too many digits if the exponent is > 0, but if the exponent is < 0 then you get fewer *significant* digits printed than you might expect, as the number will be 0.001234.... etc And of course there's still no way to set the number of digits on UDT's declared as primitives. John. PS would it be OK to apply the patch from https://svn.boost.org/trac/boost/ticket/8963 ?
data:image/s3,"s3://crabby-images/3e82c/3e82ccc202ec258b0b6ee3d319246dddb1f0ae3c" alt=""
John Maddock wrote:
Ah, I think there are still 2 bugs:
* There's no special handling for long double - so it gets the default 5 digits.
easy to fix. Post a ticket.
* You don't output in scientific format: what that means in practice is that if you write a number with a small exponent and it gets printed in fixed format, then you get std::numeric_limits<float>::digits10 + 2 digits *after the decimal point*. That may be too many digits if the exponent is > 0, but if the exponent is < 0 then you get fewer *significant* digits printed than you might expect, as the number will be 0.001234.... etc
hmmm section 18.3.2.4 of the standards says: static constexpr int digits10; 11 Number of base 10 digits that can be represented without change.198 12 Meaningful for all specializations in which is_bounded != false. static constexpr int max_digits10; 13 Number of base 10 digits required to ensure that values which differ are always differentiated. 14 Meaningful for all floating point types. So I don't read this as number of digits to the right of the decimal point. I take it to be number of significant digits which is not the same to me. But maybe it is. In anycase it looks like I should be using max_digits10 instead of digits10+2 in anycase. So this definitely should be looked into.
And of course there's still no way to set the number of digits on UDT's declared as primitives.
hmmm - since you have to implement them yourself, you can do it anyway you want.
PS would it be OK to apply the patch from https://svn.boost.org/trac/boost/ticket/8963 ?
feel free. Currently my development system is in ... err a state of flux so I can't really patch/test the library. Robert Ramey
data:image/s3,"s3://crabby-images/35eca/35eca09bc29abd18645ce131142ce2081288f054" alt=""
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Robert Ramey Sent: Thursday, August 08, 2013 5:26 PM To: boost@lists.boost.org Subject: Re: [boost] [serialization] How are floating point values handled?
But maybe it is. In anycase it looks like I should be using max_digits10 instead of digits10+2 in anycase.
Yes - the trouble is that not all platforms provide max_digits10 - it took 5 years before the first ones did, and then Microsoft got one of the constants values wrong, so another couple of years passed before anyone could use it! BOOST_NO_CXX11_NUMERIC_LIMITS tells if you std::numeric_limits<T>::max_digits10 is NOT supported. But the Kahan formula William Kahan says http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF always works 2 + std::numeric_limits<Target>::digits * 3010/10000; so you are always safe to use that. HTH Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
hmmm section 18.3.2.4 of the standards says:
static constexpr int digits10; 11 Number of base 10 digits that can be represented without change.198 12 Meaningful for all specializations in which is_bounded != false.
static constexpr int max_digits10; 13 Number of base 10 digits required to ensure that values which differ are always differentiated. 14 Meaningful for all floating point types.
So I don't read this as number of digits to the right of the decimal point. I take it to be number of significant digits which is not the same to me. But maybe it is. In anycase it looks like I should be using max_digits10 instead of digits10+2 in anycase.
max_digits10 is a C++11 feature, but there's a config macro for it somewhere. However, you're confusing "how many digits do I need to fully represent this type", with "how do you want this type formatted". There are basically 3 formatting options: 1) std::ios_base::fixed is set. Then the precision is interpreted as the number of digits after the decimal point. 2) std::ios_base::scientific is set. The precision is interpreted as the number of significant digits to print. 3) Neither of the above are set (the default for a new stream). The formatter chooses either (1) or (2) based on conditions I don't recall, but basically large exponents go to (2). If you care about being able to re-read the value back in you have to choose (2), the other options are for "pretty printing" for humans to read. HTH, John.
data:image/s3,"s3://crabby-images/3e82c/3e82ccc202ec258b0b6ee3d319246dddb1f0ae3c" alt=""
John Maddock wrote:
Ah... my problem was with a UDT that was marked as a primitive: then it just calls the << operator and doesn't attempt to set the stream precision as far as I can tell?
The issue would presumably also surface if someone tried to non-intrusively support non-standard native floating point types such as GCC's __float128 or Intel's _Quad data types, you can write a "serialize" function instead, but as you already pointed out, that involves more typing once you you split the method into load/save and binary/text variants.
then it would seem to me that the basic_text_oprimitive.hpp should be enhanced to conditionally support these types. This shouldn't be very hard as long as these plaforms already support stream i/o of these types. (somehow I doubt they do). But then you've not lost portability of the text_?archive. Soooo how about if you leave your mp types a prinitive types and portable implement steaming operators for them? You could then address the round tripping issue to your taste. And besides, don't you need to do this anyway so users can display/input your new types? So wouldn't the serilization come for free here? Robert Ramey
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
The issue would presumably also surface if someone tried to non-intrusively support non-standard native floating point types such as GCC's __float128 or Intel's _Quad data types, you can write a "serialize" function instead, but as you already pointed out, that involves more typing once you you split the method into load/save and binary/text variants.
then it would seem to me that the basic_text_oprimitive.hpp should be enhanced to conditionally support these types. This shouldn't be very hard as long as these plaforms already support stream i/o of these types. (somehow I doubt they do).
You're right, I don't think they do yet, you have to call special C routines in libquadmath. I just used them as examples that easy extensibility is a good thing.
But then you've not lost portability of the text_?archive. Soooo how about if you leave your mp types a prinitive types and portable implement steaming operators for them? You could then address the round tripping issue to your taste. And besides, don't you need to do this anyway so users can display/input your new types? So wouldn't the serilization come for free here?
Sigh. My MP types already have streaming operators, and as they're supposed to, *they honor the current stream precision*, and in the current serialization code that means they see either a requested precision of 5 digits (the default), or else whatever precision the stream may have been left in after serializing a float/double etc. Same issue with long double which doesn't currently have a text-archive-save-overload to set the precision. I'm suggesting a generic solution in the generic primitive-save code would be better. I'll try and post a patch. John.
data:image/s3,"s3://crabby-images/3e82c/3e82ccc202ec258b0b6ee3d319246dddb1f0ae3c" alt=""
John Maddock wrote:
I can't find anything in the docs about how floating point types are handled by the serialization lib. It seems like text archives print however many digits are specified in the ostream's precision is that correct? If so I think this needs to be raised as a big red flag, because it means that floating point values, uniquely compared to other primitives, do not round trip by default. Except actually they do when using a binary archive. This actually raises a problem when using text archives with floating point data - if you want to be able to round trip the values, then how many digits precision should you set the stream to? Particularly if you're saving a complex structure containing different floating point types of differing precisions?
I would have thought it would be better for the serialization lib to set the stream precision before outputting a primitive type - to std::numeric_limits<T>::max_digits10 (or std::numeric_limits<T>::digits10+2 if max_digits10 is not available). However, I recognize that this is a difficult issue!
lol - thanks for recognising that this is a difficult issue. It is reported on a regular basis. binary_?archive - no issue. portable_binary_archives - doesn't support floating point numbers. text archives - this depends on the std::stream to do the the conversion to text and back again. It uses functions in this class to attempt to set the precision to high enough number so that no more information is lost than is necessary. Conversion to text and back again has some inherent problems. Note that these have their root cause in the std::stream implementations and design rather than the serialization library itself. a) there is not necessarily a one to one mapping of every ieee 754 number with a binary mantissa to a decimal representation. My view is that on who relies on perfect round tripping of a floating point number is making a design mistake. Leave aside the fact that it cannot be portable between machines with different floating point representations (and precisions). It conflicts with what a floating point number really is. It's an attempt to capture some continuous value to finite level of precision. It generally represents some physical quantity which generally can only be measured to a precision less than that which our floating point representation can represent. So I've very suspicious of any program which requires perfect round tripping - if our program depends on having more precision than that which can actually be measured - what can theh program actually mean? b) compilers/libraries don't handle NaN in a consistent way so handling these is inherently non-portable. I think I address this by trapping whenever one tries to serialize a NaN. My reasoning was that it was a pain to implement, and would be unreliable. I also feel (and felt) that anyone actually trying to do this is making a mistake and should think about what he's really doing. (I caught hell for say this - anyone who does this doesn't know what he's doing. Maybe it was the way I phrased it - oh well). I'm pretty much sure that all this is not new you. I am in awe of your accomplishments in the creation of boost libraries. But I included the (verbose) response - because I like to stir any pot presented to me. FWIW- many years ago I attended a numerical analysis class taught by professor William Kahan. It stuck with me all my life. Only relatively recently, did I discover his pivitol role in the creation of the ieee754 standard and the intel 8087 processor. I had proposed him as a keynote speaker at BoostCon - but no one had heard of him. Floating point arithmetic is an incredibly rich topic - much more than meets the eye. Sorry if I got carried away. Robert Ramey
data:image/s3,"s3://crabby-images/39fcf/39fcfc187412ebdb0bd6271af149c9a83d2cb117" alt=""
I would have thought it would be better for the serialization lib to set the stream precision before outputting a primitive type - to std::numeric_limits<T>::max_digits10 (or std::numeric_limits<T>::digits10+2 if max_digits10 is not available). However, I recognize that this is a difficult issue!
lol - thanks for recognising that this is a difficult issue. It is reported on a regular basis.
binary_?archive - no issue.
portable_binary_archives - doesn't support floating point numbers.
text archives - this depends on the std::stream to do the the conversion to text and back again. It uses functions in this class to attempt to set the precision to high enough number so that no more information is lost than is necessary. Conversion to text and back again has some inherent problems. Note that these have their root cause in the std::stream implementations and design rather than the serialization library itself.
a) there is not necessarily a one to one mapping of every ieee 754 number with a binary mantissa to a decimal representation.
My view is that on who relies on perfect round tripping of a floating point number is making a design mistake. Leave aside the fact that it cannot be portable between machines with different floating point representations (and precisions). It conflicts with what a floating point number really is. It's an attempt to capture some continuous value to finite level of precision. It generally represents some physical quantity which generally can only be measured to a precision less than that which our floating point representation can represent. So I've very suspicious of any program which requires perfect round tripping - if our program depends on having more precision than that which can actually be measured - what can theh program actually mean?
Point taken, however it is technically possible to achieve perfect round tripping: which is to say both decimal to binary and binary to decimal conversions round to nearest. It is however hard to achieve, requires arbitrary precision arithmetic (yes even for float/double) and isn't what all std lib's do: msvc being the main culprit here.
b) compilers/libraries don't handle NaN in a consistent way so handling these is inherently non-portable.
I think I address this by trapping whenever one tries to serialize a NaN. My reasoning was that it was a pain to implement, and would be unreliable. I also feel (and felt) that anyone actually trying to do this is making a mistake and should think about what he's really doing. (I caught hell for say this - anyone who does this doesn't know what he's doing. Maybe it was the way I phrased it - oh well).
It sounds reasonable in most cases - however there are situations when a genuine NaN might be required - an example might be where you have a table of statistics, and a NaN is used to indicate "no data". John.
data:image/s3,"s3://crabby-images/35eca/35eca09bc29abd18645ce131142ce2081288f054" alt=""
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of John Maddock Sent: Thursday, August 08, 2013 9:27 AM To: boost@lists.boost.org Subject: Re: [boost] [serialization] How are floating point values handled?
I would have thought it would be better for the serialization lib to set the stream precision before outputting a primitive type - to std::numeric_limits<T>::max_digits10 (or std::numeric_limits<T>::digits10+2 if max_digits10 is not available). However, I recognize that this is a difficult issue!
lol - thanks for recognising that this is a difficult issue. It is reported on a regular basis.
binary_?archive - no issue.
portable_binary_archives - doesn't support floating point numbers.
text archives - this depends on the std::stream to do the the conversion to text and back again. It uses functions in this class to attempt to set the precision to high enough number so that no more information is lost than is necessary. Conversion to text and back again has some inherent problems. Note that these have their root cause in the std::stream implementations and design rather than the serialization library itself.
a) there is not necessarily a one to one mapping of every ieee 754 number with a binary mantissa to a decimal representation.
My view is that on who relies on perfect round tripping of a floating point number is making a design mistake. Leave aside the fact that it cannot be portable between machines with different floating point representations (and precisions). It conflicts with what a floating point number really is. It's an attempt to capture some continuous value to finite level of precision. It generally represents some physical quantity which generally can only be measured to a precision less than that which our floating point representation can represent. So I've very suspicious of any program which requires perfect round tripping - if our program depends on having more precision than that which can actually be measured - what can theh program actually mean?
Point taken, however it is technically possible to achieve perfect round tripping: which is to say both decimal to binary and binary to decimal conversions round to nearest. It is however hard to achieve, requires arbitrary precision arithmetic (yes even for float/double) and isn't what all std lib's do: msvc being the main culprit here.
b) compilers/libraries don't handle NaN in a consistent way so handling these is inherently non-portable.
I think I address this by trapping whenever one tries to serialize a NaN. My reasoning was that it was a pain to implement, and would be unreliable. I also feel (and felt) that anyone actually trying to do this is making a mistake and should think about what he's really doing. (I caught hell for say this - anyone who does this doesn't know what he's doing. Maybe it was the way I phrased it - oh well).
It sounds reasonable in most cases - however there are situations when a genuine NaN might be required - an example might be where you have a table of statistics, and a NaN is used to indicate "no data".
+1 for NaN - and for infinity In Boost.Math it has proved very useful to properly support these two types - and they are 'Standard' in numeric_limits. Johan Rade's facets for Facets for Floating-Point Infinities and NaNs shows that I/O for this can be done - and is portable for all platforms that support them (all the popular ones). Whereas the layout in 64-bit doubles of IEEE754 layout is pretty standard for X86 and ARM chips, the output of infinity and NaN isn't - but using the above facets it can be fixed. So I think this is should be the objective for serialization too. Paul --- Paul A. Bristow, Prizet Farmhouse, Kendal LA8 8AB UK +44 1539 561830 07714330204 pbristow@hetp.u-net.com
data:image/s3,"s3://crabby-images/3e82c/3e82ccc202ec258b0b6ee3d319246dddb1f0ae3c" alt=""
Paul A. Bristow wrote:
-----Original Message----- It sounds reasonable in most cases - however there are situations when a genuine NaN might be required - an example might be where you have a table of statistics, and a NaN is used to indicate "no data".
+1 for NaN - and for infinity
In Boost.Math it has proved very useful to properly support these two types - and they are 'Standard' in numeric_limits.
Johan Rade's facets for Facets for Floating-Point Infinities and NaNs shows that I/O for this can be done - and is portable for all platforms that support them (all the popular ones).
Whereas the layout in 64-bit doubles of IEEE754 layout is pretty standard for X86 and ARM chips, the output of infinity and NaN isn't - but using the above facets it can be fixed.
So I think this is should be the objective for serialization too.
Paul
Here is what I would like to see happen a) Johan Rade's facets for Facets for Floating-Point Infinities and NaNs should get incorporated into boost as a separate library. b) text serialization would incorporate that library. Usage of this facility would be conditioned on an archive attribute flag assigned when the archive is created. This would leave the current behaviour and provide the NaN friendly behavior as an option. c) boost floating point support should be created which handles in a uniform manner floats 8,16, 24, 32, 64, 128 bits long. This should be handled by i/o streams. To make this job easier - only ieee754 format would be supported (of course both endians would be supported). d) floating point support should be added to portable binary archive e) and portable binary archive should become an official part of the boost serialization libary. All the "bits" are "floating" around, never the less putting all this together is a huge project - probably too big even for a master's thesis. But to me it would close a bunch of holes in C++ handling of floating point numbers. Robert Ramey
data:image/s3,"s3://crabby-images/3e82c/3e82ccc202ec258b0b6ee3d319246dddb1f0ae3c" alt=""
John Maddock wrote:
It sounds reasonable in most cases - however there are situations when a genuine NaN might be required - an example might be where you have a table of statistics, and a NaN is used to indicate "no data".
Ahhh - another trap - semantic overloading. This is a design shortcut - ie mistake which eventually leads to other pain. It's the same problem of assigning meaning to NULL values in databases. The proper way to address this is through a variant which captures what the data really is. I realize that every rule/design principle has its exceptions and we can't (and shouldn't) be telling users what to do, but I'm loath to spend huge amounts of time addressing what for me are corner cases which are side effects of more fundamental mistakes. And however I try to address them, I can't really get it right for this very reason. Robert Ramey
John.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
data:image/s3,"s3://crabby-images/a28f9/a28f9e31261b3d04eda36f756c2adca3cbb67106" alt=""
"John Maddock"
I would have thought it would be better for the serialization lib to set the stream precision before outputting a primitive type - to std::numeric_limits<T>::max_digits10 (or std::numeric_limits<T>::digits10+2 if max_digits10 is not available). However, I recognize that this is a difficult issue!
Would dumping the floating point value in hexadecimal format be a solution? I mean what %a does in C99 printf. Then we wouldn't lose any precision. Thanks, PM
participants (4)
-
John Maddock
-
Paul A. Bristow
-
Petr Machata
-
Robert Ramey