[endian] Some suggestions

newer
Serialize a an image with Boost &...

Tatsuyuki Ishi

10 Apr 2016 10 Apr '16

3:03 a.m.

Mr. Beman, Here are some suggestions for the endian library. 1. Bring back float: make a proper reason for abandon You told about NaN: but it shouldn't apply since you store it in char buffer. Provide a proper reason: float is still needed since this is the only library that wraps details (platform endianness detection+compiler builtin optimizations). Float points are indispensable in data formats. I have to do a reinterpret_cast now as a workaround. 2. Remove n_bits(endian buffers): completely useless This just look like an alternative to sizeof, and always /8'd when used. What's the meaning of that? 3. Minor API suggestions Move operator type to endian_buffer: it's not related to arithmetic but just used because it can act like an integer. Non-const data(): I'm tired to do reinterpret_cast. Look at C++11 vector for a reference. Move stream operators to endian_arithmetic: I first thought that was for reading binary. That's basically unused.

Show replies by date

Beman Dawes

10 Apr 10 Apr

1:18 p.m.

On Sat, Apr 9, 2016 at 11:03 PM, Tatsuyuki Ishi wrote:

...

Mr. Beman, Here are some suggestions for the endian library.

1. Bring back float: make a proper reason for abandon You told about NaN: but it shouldn't apply since you store it in char buffer.

The problem isn't when storing into the char buffer, but later on a different machine or compiler when the char buffer has to be converted to a float type. In testing, I even ran into a case were on the same machine and compiler the float data failed to round-trip correctly if the program that created the file was compiled with a different set of optimization options that the program the read the file.

...

Provide a proper reason: float is still needed since this is the only library that wraps details (platform endianness detection+compiler builtin optimizations). Float points are indispensable in data formats. I have to do a reinterpret_cast now as a workaround.

If you can refer me to a precise specification for an external (i.e. on the disk or on the wire) float format bit layout that will work (i.e. round-trip correctly) with actual compilers on a wide range platforms, then please do so. It would ideally be based on some widely accepted existing standard. And it has to be practical to implement. I'm also wondering why there doesn't seem to be existing practice for floats, where there is much existing practice for integers stretching back to the 1970's or earlier.

...

2. Remove n_bits(endian buffers): completely useless This just look like an alternative to sizeof, and always /8'd when used. What's the meaning of that?

Please give a specific code example from the docs of what you think should be changed. Or submit a pull request.

...

3. Minor API suggestions Move operator type to endian_buffer: it's not related to arithmetic but just used because it can act like an integer.

Again, what code is it you think should be changed? Operator value_type()? Please submit a pull request.

...

Non-const data(): I'm tired to do reinterpret_cast. Look at C++11 vector for a reference.

What is the motivation? What are the safety implications? What are the alternatives? The C++11 (and also C++17) addition of non-cost data() members to some standard library containers was supported by motivation and safety analysis.

...

Move stream operators to endian_arithmetic: I first thought that was for reading binary. That's basically unused.

I must be missing something. Why wouldn't stream operators apply to endian buffers? A code example might help. Thanks, --Beman

Peter Dimov

2:01 p.m.

Beman Dawes wrote:

...

The problem isn't when storing into the char buffer, but later on a different machine or compiler when the char buffer has to be converted to a float type.

In testing, I even ran into a case were on the same machine and compiler the float data failed to round-trip correctly if the program that created the file was compiled with a different set of optimization options that the program the read the file.

Perhaps you could go into more detail here, because I'm not sure I understand the nature of the problem. Intuitively, if you have a little-endian IEEE platform, and you write the float out to disk, and then you read that back (even if using different optimization options), you ought to end up with the same float, because this is no different than memcpy'ing one float to another. float x = ..., y; memcpy( &y, &x, sizeof(float)); // y should contain x unsigned char buffer[ sizeof(float) ]; memcpy( buffer, &x, sizeof(float)); memcpy( &y, buffer, sizeof(float)); // should be the same as above unsigned char buffer[ sizeof(float) ]; memcpy( buffer, &x, sizeof(float)); // store buffer to disk // ... // load buffer from disk memcpy( &y, buffer, sizeof(float)); // should be the same as above At which point do things break? Furthermore, if the "store buffer" and "load buffer" lines byteswap, this makes no difference. Now, if you don't store or load "buffer" but a "float" directly, and then byteswap that float directly in place, I wouldn't bet on that working. But there's no need to do so.

Beman Dawes

13 Apr 13 Apr

10:57 a.m.

On Sun, Apr 10, 2016 at 10:01 AM, Peter Dimov wrote:

...

Beman Dawes wrote:

...
The problem isn't when storing into the char buffer, but later on a different machine or compiler when the char buffer has to be converted to a float type.

In testing, I even ran into a case were on the same machine and compiler the float data failed to round-trip correctly if the program that created the file was compiled with a different set of optimization options that the program the read the file.

Perhaps you could go into more detail here, because I'm not sure I understand the nature of the problem.

Sorry, I misspoke. I went back and found my original notes. My test program mistakenly depended on std::numeric_limits<...>::signaling_NaN(), and standard library implementations are free to use different values for that. For example, clang/c2 on Windows uses a different value than clang on Ubuntu.

...

Intuitively, if you have a little-endian IEEE platform, and you write the float out to disk, and then you read that back (even if using different optimization options), you ought to end up with the same float, because this is no different than memcpy'ing one float to another.

float x = ..., y;

memcpy( &y, &x, sizeof(float)); // y should contain x

unsigned char buffer[ sizeof(float) ]; memcpy( buffer, &x, sizeof(float)); memcpy( &y, buffer, sizeof(float)); // should be the same as above

unsigned char buffer[ sizeof(float) ]; memcpy( buffer, &x, sizeof(float)); // store buffer to disk // ... // load buffer from disk memcpy( &y, buffer, sizeof(float)); // should be the same as above

At which point do things break?

They don't, because in that example you are careful to avoid direct use of "float".

...

Furthermore, if the "store buffer" and "load buffer" lines byteswap, this makes no difference.

Agreed.

...

Now, if you don't store or load "buffer" but a "float" directly, and then byteswap that float directly in place, I wouldn't bet on that working. But there's no need to do so.

Agreed, and directly byteswapping a float is avoided in the endian buffer and endian arithmetic classes. But to follow the same pattern as the conversion functions, the interface would look like this: float endian_reverse(float x) noexcept; but like you and whoever raised the issue originally, I don't want to bet on that working. --Beman

Peter Dimov

11:19 a.m.

Beman Dawes wrote:

...

But to follow the same pattern as the conversion functions, the interface would look like this:

float endian_reverse(float x) noexcept;

but like you and whoever raised the issue originally, I don't want to bet on that working.

Passing an arbitrary sequence of bytes by value should in principle be avoided even for int because of trap representations, so adding float support may provide the motivation to fix that as well. The correct type of a wrong-endian-int or a wrong-endian-float is unsigned char[sizeof(int)] or unsigned char[sizeof(float)], not int or float.

Beman Dawes

12:30 p.m.

On Wed, Apr 13, 2016 at 7:19 AM, Peter Dimov wrote:

...

Beman Dawes wrote:

But to follow the same pattern as the conversion functions, the interface

...
would look like this:

float endian_reverse(float x) noexcept;

but like you and whoever raised the issue originally, I don't want to bet on that working.

Passing an arbitrary sequence of bytes by value should in principle be avoided even for int because of trap representations, so adding float support may provide the motivation to fix that as well.

The correct type of a wrong-endian-int or a wrong-endian-float is unsigned char[sizeof(int)] or unsigned char[sizeof(float)], not int or float.

Agreed. The traditional conversion approach of passing and returning an int or whatever is essentially an undiscriminated union: union here_be_dragons { int maybe_this; unsigned char[4] or_maybe_this; }; Maybe the endian conversion functions should stop perpetuating such insanity. --Beman

Bjorn Reese

10 Apr 10 Apr

4:10 p.m.

On 04/10/2016 03:18 PM, Beman Dawes wrote:

...

The problem isn't when storing into the char buffer, but later on a different machine or compiler when the char buffer has to be converted to a float type.

Another problem related to the above is that the NaN payload is implementation-defined, so even if the exact bit-pattern is preserved, programs compiled with different compilers may interpret the payload differently.

Gavin Lambert

11:47 p.m.

On 11/04/2016 04:10, Bjorn Reese wrote:

...

Another problem related to the above is that the NaN payload is implementation-defined, so even if the exact bit-pattern is preserved, programs compiled with different compilers may interpret the payload differently.

While that's true, surely any compiler/library should treat any NaN bit pattern as returning true from isnan() and friends, even if it is not bit-identical to the NaN that the compiler/library would generate itself. Using it in a further calculation might result in a different NaN bit pattern, but that's expected behaviour anyway. So at least in theory it should work unless there are bugs in the compiler/library or in the application (eg. it is not an error if <external NaN> != <internal NaN>; it's an app bug if it makes an equality assumption -- for that matter equality can't even be assumed between internal NaNs).

Bjorn Reese

12 Apr 12 Apr

8:51 a.m.

On 04/11/2016 01:47 AM, Gavin Lambert wrote:

...

While that's true, surely any compiler/library should treat any NaN bit pattern as returning true from isnan() and friends, even if it is not bit-identical to the NaN that the compiler/library would generate itself.

Correct, arithmetic and relational operations disregard the NaN payload; that is by design. If, however, you use the NaN payload to convey information, such as errors or diagnostics, then the compiler generated payloads can cause interoperability problems.

Peter Dimov

12:28 p.m.

Bjorn Reese wrote:

...

On 04/11/2016 01:47 AM, Gavin Lambert wrote:

...
While that's true, surely any compiler/library should treat any NaN bit pattern as returning true from isnan() and friends, even if it is not bit-identical to the NaN that the compiler/library would generate itself.

Correct, arithmetic and relational operations disregard the NaN payload; that is by design. If, however, you use the NaN payload to convey information, such as errors or diagnostics, then the compiler generated payloads can cause interoperability problems.

That's true but I still don't see why the Endian library has anything to do with it. Its job is to put back the bits in your float the way they were written; if the compiler then replaces the bits with something else, there's nothing the library can do. The library should give you a float that is the same as the hex float literal with the same bits that were written; it could do nothing more.

Thijs van den Berg

12:48 p.m.

...

Bjorn Reese wrote:

...
If, however, you use the NaN payload to convey information, such as errors or diagnostics, then the compiler generated payloads can cause interoperability problems.

...

From what I've understood the scope of the endian library is to support 1), and the wider scope of 2) would be more for a serialisations type of

There are two type of interoperability IMO: 1) a single program read and writes bytes to memory or disk 2) two different compilations of a program that run on (an have been compiled for) different machines and which want to exchange floats via some form of serialisation. library. I'd expect that 1) would probably not have interoperability issues, whereas 2) potentially can?

Rene Rivera

1:54 p.m.

Sorry to but in.. On Tue, Apr 12, 2016 at 7:48 AM, Thijs van den Berg wrote:

...

...
Bjorn Reese wrote:

...
If, however, you use the NaN payload to convey information, such as errors or diagnostics, then the compiler generated payloads can cause interoperability problems.

There are two type of interoperability IMO: 1) a single program read and writes bytes to memory or disk 2) two different compilations of a program that run on (an have been compiled for) different machines and which want to exchange floats via some form of serialisation.

From what I've understood the scope of the endian library is to support 1), and the wider scope of 2) would be more for a serialisations type of library. I'd expect that 1) would probably not have interoperability issues, whereas 2) potentially can?

That logic doesn't follow for me. If I replace "float" with "int" in the above. The conclusion is different. And I conclude that the Endian library is indeed for both #1 and #2. Hence it seems inconsistent to me that the library would adhere to different conclusions for some types (i.e. float) than the other types (i.e. base 2 types). -- -- Rene Rivera -- Grafik - Don't Assume Anything -- Robot Dreams - http://robot-dreams.net -- rrivera/acm.org (msn) - grafikrobot/aim,yahoo,skype,efnet,gmail

Thijs van den Berg

2:12 p.m.

...

Sorry to but in..

On Tue, Apr 12, 2016 at 7:48 AM, Thijs van den Berg wrote:

...
...
Bjorn Reese wrote:

...
If, however, you use the NaN payload to convey information, such as errors or diagnostics, then the compiler generated payloads can cause interoperability problems.

There are two type of interoperability IMO: 1) a single program read and writes bytes to memory or disk 2) two different compilations of a program that run on (an have been compiled for) different machines and which want to exchange floats via some form of serialisation.

From what I've understood the scope of the endian library is to support 1), and the wider scope of 2) would be more for a serialisations type of library. I'd expect that 1) would probably not have interoperability issues, whereas 2) potentially can?

That logic doesn't follow for me. If I replace "float" with "int" in the above. The conclusion is different. And I conclude that the Endian library is indeed for both #1 and #2. Hence it seems inconsistent to me that the library would adhere to different conclusions for some types (i.e. float) than the other types (i.e. base 2 types).

The way I interpret terminology is that "endianess" is about the byte order in multibyte words, you might want to manipulate the low-byte of an int of float that's stored in a word. A float is a floating point viewof

On 12 April 2016 at 15:54, Rene Rivera wrote: the bits inside the word, and that view is not fully specified. For int's you'll also have various possible mappings between the number it represents and the bitsequence, e.g. ones' and two's complement for signed integers. The way I see it is that the endian library is about the the byte order inside a word and not about the layout/represenation of integers and float into that word?

Asbjørn

2:55 p.m.

On 12.04.2016 16:12, Thijs van den Berg wrote:

...

The way I see it is that the endian library is about the the byte order inside a word and not about the layout/represenation of integers and float into that word?

FWIW, as a library user that's what I'd expect when reading the name "Boost.Endian". Anything more and it's serialization in my view. For floats I'd expect it to treat float as it treats uint32, that is effectively convenience functions for memcpy-ing to/from an uint32 and doing the endian operations on the uint32. Just my 2 øre. Cheers - Asbjørn

Peter Dimov

2:04 p.m.

Thijs van den Berg wrote:

...

There are two type of interoperability IMO: 1) a single program read and writes bytes to memory or disk 2) two different compilations of a program that run on (an have been compiled for) different machines and which want to exchange floats via some form of serialisation.

From what I've understood the scope of the endian library is to support 1),

That would be a bit odd, because I'm not sure that you need a library for 1) at all. The endianness can't differ within the same program.

Thijs van den Berg

2:27 p.m.

On 12 April 2016 at 16:04, Peter Dimov wrote:

...

Thijs van den Berg wrote:

...
There are two type of interoperability IMO: 1) a single program read and writes bytes to memory or disk 2) two different compilations of a program that run on (an have been compiled for) different machines and which want to exchange floats via some form of serialisation.

From what I've understood the scope of the endian library is to support 1),

That would be a bit odd, because I'm not sure that you need a library for 1) at all. The endianness can't differ within the same program.

good point, like SUN, right? IMO the library should be able to handle that *or* make it clear what platforms it supports. and which ones not. I wouldn't however expect the library to do any representation conversions other that the primitive operations regarding the byte order inside words. If "numerical representation conversion" was included in the scope then one would perhaps name the library differently? (I like libraries to have understandable names). I can imagine that not all number mappings are bijective, so that would be a complex library that require policies, inspection etc. To me it feels that if you go that way, then it'll be more about serialisation, .. would you later also add byte counted string to that library? Or would the scope be limited to numerical hardware represenations (floats/ints)?

Bjorn Reese

2:42 p.m.

On 04/12/2016 02:28 PM, Peter Dimov wrote:

...

That's true but I still don't see why the Endian library has anything to do with it. Its job is to put back the bits in your float the way they were written; if the compiler then replaces the bits with something else, there's nothing the library can do. The library should give you a float that is the same as the hex float literal with the same bits that were written; it could do nothing more.

Beman said that he wanted to submit a proposal for C++ standardization to unveil the various intricacies of floating point numbers. That is why I mentioned NaN payloads in the first place. I do not see them as showstoppers for the inclusion of floats in Boost.Endian. That said, binary serialization protocols are going to be some of the major "customers" of Boost.Endian, so I find the line between endian and serialization a bit blurred. While the interoperability problems surrounding NaN payloads are caused by serialization, Boost.Endian with float support is going to make it easier to make such mistakes.

Peter Dimov

2:51 p.m.

Bjorn Reese wrote:

...

While the interoperability problems surrounding NaN payloads are caused by serialization, ...

I'm not sure that this is true. If the sender and the receiver are both IEEE, the serialization and the deserialization cannot be the source of interoperability problems because they just transfer the bits from point A to point B. The same problems with NaN payloads should exist if the programmer uses a hex float literal with the problematic NaN payload. No serialization in this case.

Bjorn Reese

3:49 p.m.

On 04/12/2016 04:51 PM, Peter Dimov wrote:

...

Bjorn Reese wrote:

...
While the interoperability problems surrounding NaN payloads are caused by serialization, ...

I'm not sure that this is true. If the sender and the receiver are both IEEE, the serialization and the deserialization cannot be the source of interoperability problems because they just transfer the bits from point A to point B.

The purpose of serialization is to transform values between C++ types and types defined by a serialization protocol. If the protocol designer opts for IEEE 754 floats, then the protocol should specify all aspects that IEEE 754 does not; this includes endianness and the semantics of NaN payloads.

Thijs (M.A.) van den Berg

4:07 p.m.

...

On 12 Apr 2016, at 17:49, Bjorn Reese wrote:

...
On 04/12/2016 04:51 PM, Peter Dimov wrote: Bjorn Reese wrote:

...
While the interoperability problems surrounding NaN payloads are caused by serialization, ...

I'm not sure that this is true. If the sender and the receiver are both IEEE, the serialization and the deserialization cannot be the source of interoperability problems because they just transfer the bits from point A to point B.

The purpose of serialization is to transform values between C++ types and types defined by a serialization protocol. If the protocol designer opts for IEEE 754 floats, then the protocol should specify all aspects that IEEE 754 does not; this includes endianness and the semantics of NaN payloads.

Yes. When the source is fully specified you will then also need a fully specified target representation (which might not be native), and specify rules on how to convert source to target in case the mapping is not unique. Conversion rules could use the same concepts as one would use between C++ types. I think this is very useful functionality, but I think this would be better named "number representations" than "endian"

3239

Age (days ago)

3242

Last active (days ago)

List overview

Download

19 comments

9 participants

participants (9)

Asbjørn
Beman Dawes
Bjorn Reese
Gavin Lambert
Peter Dimov
Rene Rivera
Tatsuyuki Ishi
Thijs (M.A.) van den Berg
Thijs van den Berg