[Review] Boost.Endian mini-review

older
[endian] Post-mini-review status:...

Joel FALCOU

23 Jan 2015 23 Jan '15

4:17 p.m.

The mini-review of Boost.Endian starts today and runs through Sunday, February 1. The library repository is available on GitHub at https://github.com/boostorg/endian The docs for the library can be viewed online at https://boostorg.github.io/endian/ The objective of the mini-review is to verify that the issues raised during the formal review have been addressed. Those issues are documented at https://boostorg.github.io/endian/mini_review_topics.html which also describes the resolution of each issue. Comments and suggestions unrelated to the mini-review issues are also welcome, but the key question the review manager needs your opinion on is this: Is the library ready to be added to Boost releases? -- Joel Falcou, Endian Review Manager

Show replies by date

Peter Dimov

23 Jan 23 Jan

5:47 p.m.

Joel FALCOU wrote:

...

Is the library ready to be added to Boost releases?

Let me preface everything else I'll say by that: I think that the answer to the above question is "yes", as every question raised during the review seems to have been addressed. That said, I want to point out that the library in its current state does not support one approach to dealing with endianness, which I will outline below. Let's take the following hypothetical file format as an example, loosely lifted from the docs: [00] code (32 bit big endian) [04] length (32 bit big endian) [08] version (16 bit little endian) [10] shape type (32 bit little endian) [14] ... All three approaches that the library supports involve declaring a struct with the above members and expecting that this struct can be read/written directly to file, which means that its layout and size must correspond to the above description. What I tend to do, however, is rather different. I do declare a corresponding struct: struct header { int code; unsigned length; int version; int shape_type; }; but never read or write it directly, which means that I do not need to make sure that its layout and size are fixed. Instead, in the function read(h), I do this (pseudocode): read( header& h ) { unsigned char data[ 14 ]; fread data from file; read_32_lsb( h.code, data + 0 ); read_32_lsb( h.length, data + 4 ); read_16_msb( h.version, data + 8 ); read_32_lsb( h.shape_type, data + 10 ); } Note that this does not require the machine to have a 32 bit int or a 16 bit int at all. int can be 48 bits wide and even have trap bits. Which is, I admit, only of academic interest today, but still. The generic implementation of read_32_lsb is: void read_32_lsb( int & v, unsigned char data[ 4 ] ) { unsigned w = data[ 0 ]; w += (unsigned)data[ 1 ] << 8; w += (unsigned)data[ 2 ] << 16; w += (unsigned)data[ 3 ] << 24; v = w; } which works on any endianness. This approach - as shown - does have a drawback. If you have an array of 802511 32 bit lsb integers in the file, and the native int is 32 bit lsb, one read of 802511*4 bytes is vastly superior in performance to a loop that would read 4 bytes and call read_32_lsb on the byte[4]. Which is why the above is generally combined with void read_32_lsb_n( int * first, int * last, FILE * fp ); but this ties us to using FILE* and, given the rest of the library, is not strictly needed because it offers us enough options to handle this case efficiently.

Nevin Liber

7:26 p.m.

Is the library ready to be added to Boost releases? YES! On 23 January 2015 at 11:47, Peter Dimov <lists@pdimov.com> wrote: Instead, in the function read(h), I do this (pseudocode):

...

read( header& h ) { unsigned char data[ 14 ]; fread data from file;

read_32_lsb( h.code, data + 0 ); read_32_lsb( h.length, data + 4 ); read_16_msb( h.version, data + 8 ); read_32_lsb( h.shape_type, data + 10 ); }

Basically you serialize it by hand. Another more declarative (de)serialization technique is the one that Tom Rodgers talked about at CppCon "Implementing Wire Protocols with Boost Fusion" http://youtu.be/wbZdZKpUVeg. Other than specifically calling out Boost.Serialization, this already seems to be covered in the documentation: *Why not just use Boost.Serialization?* Serialization involves a conversion for every object involved in I/O. Endian integers require no conversion or copying. They are already in the desired format for binary I/O. Thus they can be read or written in bulk. -- Nevin ":-)" Liber <mailto:nevin@eviloverlord.com> (847) 691-1404

Olaf van der Spek

11:25 p.m.

On Fri, Jan 23, 2015 at 6:47 PM, Peter Dimov <lists@pdimov.com> wrote:

...

read( header& h ) { unsigned char data[ 14 ]; fread data from file;

read_32_lsb( h.code, data + 0 ); read_32_lsb( h.length, data + 4 ); read_16_msb( h.version, data + 8 ); read_32_lsb( h.shape_type, data + 10 ); }

I've been looking for (standard) functions like that too, I think they'd be very useful. You'd probably want aligned variants too as they might offer better performance. Also got a question about the name endian_reverse: wouldn't reverse_endian make more sense? -- Olaf

Beman Dawes

25 Jan 25 Jan

3:48 p.m.

On Fri, Jan 23, 2015 at 6:25 PM, Olaf van der Spek <ml@vdspek.org> wrote: ...

...

Also got a question about the name endian_reverse: wouldn't reverse_endian make more sense?

I've gone around and around in circles between "reverse_endianness" (most explicit, but too long), "reverse_endian" (which grates on me a bit because every time I see it I waste time thinking it should be spelled "reverse_endianness"), and "endian_reverse". I like endian_reverse because it starts right out by alerting the reader as to the context of the reversal. Since this function is used as a UDT customization point, so gets defined in a user namespace for UDT types, that seems important to me. Thanks, --Beman

Beman Dawes

24 Jan 24 Jan

2:54 p.m.

On Fri, Jan 23, 2015 at 12:47 PM, Peter Dimov <lists@pdimov.com> wrote:

...

Joel FALCOU wrote:

...
Is the library ready to be added to Boost releases?

Let me preface everything else I'll say by that: I think that the answer to the above question is "yes", as every question raised during the review seems to have been addressed.

That said, I want to point out that the library in its current state does not support one approach to dealing with endianness, which I will outline below.

Let's take the following hypothetical file format as an example, loosely lifted from the docs:

[00] code (32 bit big endian) [04] length (32 bit big endian) [08] version (16 bit little endian) [10] shape type (32 bit little endian) [14] ...

All three approaches that the library supports involve declaring a struct with the above members and expecting that this struct can be read/written directly to file, which means that its layout and size must correspond to the above description.

What I tend to do, however, is rather different. I do declare a corresponding struct:

struct header { int code; unsigned length; int version; int shape_type; };

but never read or write it directly, which means that I do not need to make sure that its layout and size are fixed.

Yes, decoupling the external I/O format from the internal format is often good design.

...

Instead, in the function read(h), I do this (pseudocode):

read( header& h ) { unsigned char data[ 14 ]; fread data from file;

read_32_lsb( h.code, data + 0 ); read_32_lsb( h.length, data + 4 ); read_16_msb( h.version, data + 8 ); read_32_lsb( h.shape_type, data + 10 ); }

Using the endian library to achieve the same effect: struct header // same as above { int code; unsigned length; int version; int shape_type; }; read( header& h ) { struct { big_int32_buf_t code; big_uint32_buf_t length; little_int16_buf_t version; big_int32_buf_ut shape_type; } data; fread data from file; h.code = data.code.value(); h.length = data.length.value(); h.version = data.version.value(); h.shape_type = data.shape_type(); } This seems to me to be better code because the layout of either of the two structs can change without having to change anything else. In your code, if the size of " [00] code (32 bit big endian)" changed to " [00] code (64 bit big endian)", then all four of the read_* function calls would have to be changed. --Beman

Olaf van der Spek

3:52 p.m.

On Sat, Jan 24, 2015 at 3:54 PM, Beman Dawes <bdawes@acm.org> wrote:

...

struct { big_int32_buf_t code; big_uint32_buf_t length; little_int16_buf_t version; big_int32_buf_ut shape_type; } data;

You're mixing aligned and unaligned types. Shouldn't all be unaligned? See the docs: "Warning: Code that uses aligned types is possibly non-portable because alignment requirements vary between hardware architectures and because alignment may be affected by compiler switches or pragmas. For example, alignment of an 64-bit integer may be to a 32-bit boundary on a 32-bit machine. Furthermore, aligned types are only available on architectures with 16, 32, and 64-bit integer types." -- Olaf

Peter Dimov

4:04 p.m.

Olaf van der Spek wrote:

...

On Sat, Jan 24, 2015 at 3:54 PM, Beman Dawes <bdawes@acm.org> wrote:

...
struct { big_int32_buf_t code; big_uint32_buf_t length; little_int16_buf_t version; big_int32_buf_ut shape_type; } data;

You're mixing aligned and unaligned types. Shouldn't all be unaligned? See the docs: "Warning: Code that uses aligned types is possibly non-portable because alignment requirements vary between hardware architectures and because alignment may be affected by compiler switches or pragmas. For example, alignment of an 64-bit integer may be to a 32-bit boundary on a 32-bit machine. Furthermore, aligned types are only available on architectures with 16, 32, and 64-bit integer types."

Yes, a good argument against having aligned buffers at all, even though the docs do provide rationale for them. If you want aligned, use the non-buffer types.

Olaf van der Spek

4:13 p.m.

On Sat, Jan 24, 2015 at 5:04 PM, Peter Dimov <lists@pdimov.com> wrote:

...

Olaf van der Spek wrote:

...
On Sat, Jan 24, 2015 at 3:54 PM, Beman Dawes <bdawes@acm.org> wrote:

...
struct { big_int32_buf_t code; big_uint32_buf_t length; little_int16_buf_t version; big_int32_buf_ut shape_type; } data;

You're mixing aligned and unaligned types. Shouldn't all be unaligned? See the docs: "Warning: Code that uses aligned types is possibly non-portable because alignment requirements vary between hardware architectures and because alignment may be affected by compiler switches or pragmas. For example, alignment of an 64-bit integer may be to a 32-bit boundary on a 32-bit machine. Furthermore, aligned types are only available on architectures with 16, 32, and 64-bit integer types."

Yes, a good argument against having aligned buffers at all, even though the docs do provide rationale for them. If you want aligned, use the non-buffer types.

I think the default / shortest name types should be unaligned, with aligned access being explicit. -- Olaf

Beman Dawes

25 Jan 25 Jan

4:53 p.m.

On Sat, Jan 24, 2015 at 11:13 AM, Olaf van der Spek <ml@vdspek.org> wrote:

...

On Sat, Jan 24, 2015 at 5:04 PM, Peter Dimov <lists@pdimov.com> wrote:

...
Olaf van der Spek wrote:

...
On Sat, Jan 24, 2015 at 3:54 PM, Beman Dawes <bdawes@acm.org> wrote:

...
struct { big_int32_buf_t code; big_uint32_buf_t length; little_int16_buf_t version; big_int32_buf_ut shape_type; } data;

You're mixing aligned and unaligned types. Shouldn't all be unaligned? See the docs: "Warning: Code that uses aligned types is possibly non-portable because alignment requirements vary between hardware architectures and because alignment may be affected by compiler switches or pragmas. For example, alignment of an 64-bit integer may be to a 32-bit boundary on a 32-bit machine. Furthermore, aligned types are only available on architectures with 16, 32, and 64-bit integer types."

Yes, a good argument against having aligned buffers at all, even though the docs do provide rationale for them. If you want aligned, use the non-buffer types.

I think the default / shortest name types should be unaligned, with aligned access being explicit.

I thought that at one time, too, and the names were as you suggest. There were several problems with defaulting to unaligned and thus having longer names for aligned types. 1) Real-world applications use aligned data layouts far more often than unaligned. Users prefer that defaults meet the most common needs. 2) If the default is unaligned, unaligned types will get used inadvertently where aligned types could be used, and that can cause performance problems. Remember that the familiar <cstdint> types (e.g. int16_t, int32_t) are aligned types. It is natural to assume big_int32_t is an aligned type. Thanks, --Beman

Peter Dimov

5:03 p.m.

Beman Dawes wrote:

...

2) If the default is unaligned, unaligned types will get used inadvertently where aligned types could be used, and that can cause performance problems. Remember that the familiar <cstdint> types (e.g. int16_t, int32_t) are aligned types. It is natural to assume big_int32_t is an aligned type.

big_int32_t is not a buffer type though. I don't know whether Olaf includes non-buffer types when he says:

...

...
I think the default / shortest name types should be unaligned, with aligned access being explicit.

but I most certainly do not:

...

...
...
Yes, a good argument against having aligned buffers at all, even though the docs do provide rationale for them. If you want aligned, use the non-buffer types.

Olaf van der Spek

27 Jan 27 Jan

9:26 a.m.

On Sun, Jan 25, 2015 at 5:53 PM, Beman Dawes <bdawes@acm.org> wrote:

...

I thought that at one time, too, and the names were as you suggest.

There were several problems with defaulting to unaligned and thus having longer names for aligned types.

1) Real-world applications use aligned data layouts far more often than unaligned. Users prefer that defaults meet the most common needs.

Unaligned types support all cases, aligned types do not, so it's at most a performance issue.

...

2) If the default is unaligned, unaligned types will get used inadvertently where aligned types could be used, and that can cause performance problems. Remember that the familiar <cstdint> types (e.g. int16_t, int32_t) are aligned types. It is natural to assume big_int32_t is an aligned type.

Maybe BTW, what about the requested read_be32 and write_be32 etc variants? Having a way to read an int somewhere out of a (unsigned) char buffer is really handy. -- Olaf

Beman Dawes

3:25 p.m.

On Tue, Jan 27, 2015 at 4:26 AM, Olaf van der Spek <ml@vdspek.org> wrote:

...

BTW, what about the requested read_be32 and write_be32 etc variants? Having a way to read an int somewhere out of a (unsigned) char buffer is really handy.

I've added a "Future directions" docs section, and that is one of the items. https://boostorg.github.io/endian/index.html#Future-directions --Beman

Beman Dawes

24 Jan 24 Jan

6 p.m.

On Sat, Jan 24, 2015 at 10:52 AM, Olaf van der Spek <ml@vdspek.org> wrote:

...

On Sat, Jan 24, 2015 at 3:54 PM, Beman Dawes <bdawes@acm.org> wrote:

...
struct { big_int32_buf_t code; big_uint32_buf_t length; little_int16_buf_t version; big_int32_buf_ut shape_type; } data;

You're mixing aligned and unaligned types. Shouldn't all be unaligned?

In theory, yes. If data_t ever got embedded into a larger structure at an unaligned location, that would matter. Unaligned types would also protect against architectures that required super-alignment or don't support 16/32/64 bit integer types, but I'm not sure such architectures actually exist anymore. In practice, the enhanced efficiency of aligned types is usually more important than worry about someday encountering an odd-ball architecture or use in an unaligned location. But it is an engineering tradeoff, so you can make the choice whichever way seems appropriate to your application. --Beman

Peter Dimov

6:27 p.m.

Beman Dawes wrote:

...

In practice, the enhanced efficiency of aligned types is usually more important than worry about someday encountering an odd-ball architecture or use in an unaligned location. But it is an engineering tradeoff, so you can make the choice whichever way seems appropriate to your application.

The more practical problem is not oddball architectures, but the fact that when a field changes in size, everything after that field needs to be rechecked for alignedness. (You correctly identified this as a maintenance problem with the fixed offset approach I outlined, and it also applies here.)

Nevin Liber

8:03 p.m.

On 24 January 2015 at 12:00, Beman Dawes <bdawes@acm.org> wrote:

...

In practice, the enhanced efficiency of aligned types is usually more important than worry about someday encountering an odd-ball architecture or use in an unaligned location.

I disagree with that for endian. Most of those uses are in wire and file protocols, and they are either densely packed or based on the alignment of the original architecture that the protocol was developed on. Unless one is transferring data between machines with different architectures, why would one use endian at all? -- Nevin ":-)" Liber <mailto:nevin@eviloverlord.com> (847) 691-1404

Olaf van der Spek

10:34 p.m.

On Sat, Jan 24, 2015 at 9:03 PM, Nevin Liber <nevin@eviloverlord.com> wrote:

...

On 24 January 2015 at 12:00, Beman Dawes <bdawes@acm.org> wrote:

...
In practice, the enhanced efficiency of aligned types is usually more important than worry about someday encountering an odd-ball architecture or use in an unaligned location.

I disagree with that for endian. Most of those uses are in wire and file protocols, and they are either densely packed or based on the alignment of the original architecture that the protocol was developed on. Unless one is transferring data between machines with different architectures, why would one use endian at all?

Because the protocol is specified as using big endian ints (for example) and the code has to run on little endian machines too. -- Olaf

Beman Dawes

26 Jan 26 Jan

12:24 p.m.

On Sat, Jan 24, 2015 at 3:03 PM, Nevin Liber <nevin@eviloverlord.com> wrote:

...

On 24 January 2015 at 12:00, Beman Dawes <bdawes@acm.org> wrote:

...
In practice, the enhanced efficiency of aligned types is usually more important than worry about someday encountering an odd-ball architecture or use in an unaligned location.

I disagree with that for endian. Most of those uses are in wire and file protocols, and they are either densely packed or based on the alignment of the original architecture that the protocol was developed on.

I agree with you that a lot of uses are for legacy formats, but my experience has been that most of those observe 16 and 32-bit alignment rules that are the same as we need today. 64-bit values may be a different story - I have virtually no experience with legacy formats that use 64-bit data. But regardless, the endian library must support both aligned and unaligned. I'm warming to Jeremy Maitin-Shepard's suggestion in a later post that we use explicit names for both aligned and unaligned types.

...

Unless one is transferring data between machines with different architectures, why would one use endian at all?

Most of us write programs for little-endian architectures that deal with big endian wire and file formats. Thanks, --Beman

Gavin Lambert

10 Feb 10 Feb

5:24 a.m.

On 27/01/2015 01:24, Beman Dawes wrote:

...

On Sat, Jan 24, 2015 at 3:03 PM, Nevin Liber <nevin@eviloverlord.com> wrote:

...
On 24 January 2015 at 12:00, Beman Dawes <bdawes@acm.org> wrote:

...
In practice, the enhanced efficiency of aligned types is usually more important than worry about someday encountering an odd-ball architecture or use in an unaligned location.

I disagree with that for endian. Most of those uses are in wire and file protocols, and they are either densely packed or based on the alignment of the original architecture that the protocol was developed on.

I agree with you that a lot of uses are for legacy formats, but my experience has been that most of those observe 16 and 32-bit alignment rules that are the same as we need today. 64-bit values may be a different story - I have virtually no experience with legacy formats that use 64-bit data.

Storage formats are a mixed bag (aligned is probably most common), but virtually all wire formats that I've seen require unaligned (no-padding) data.

Rob Stewart

3 Feb 3 Feb

11:17 p.m.

arithmetic.html "operators are +, -, ~, !, prefix and postfix -- and ++" has too many "and"s. Try, "operators are +, -, ~, !, plus both prefix and postfix -- and ++". The binary arithmetic and relational operators lists really should have "and" before the final operator in each series. Connects I made on buffer.html, like those on the warning and note after the typedef table apply to this page. In "An endian is an integer byte-holder", I presume that "endian" should be "endian_arithmetic". (If not, then s/endian_arithmetic/endian/ throughout the page. The same applies to the synopsis. The effects for the default constructor say nothing about the value. If it is uninitialized, a note to that effect would be helpful. ___ Rob (Sent from my portable computation engine)

Rob Stewart

11:25 p.m.

One thing that is not clear is why both the buffer and arithmetic types exist. The one, of course, derives from the other, so the only benefit would be the omission of what the derivate adds. What does one gain by using the buffer class rather than just always using the arithmetic type? IOW, why not just collapse the hierarchy? ___ Rob (Sent from my portable computation engine)

Beman Dawes

9 Feb 9 Feb

1:40 p.m.

On Tue, Feb 3, 2015 at 6:25 PM, Rob Stewart <rob.stewart@verizon.net> wrote:

...

One thing that is not clear is why both the buffer and arithmetic types exist. The one, of course, derives from the other, so the only benefit would be the omission of what the derivate adds. What does one gain by using the buffer class rather than just always using the arithmetic type? IOW, why not just collapse the hierarchy?

I've added these two entries to the overall FAQ: Why do both the buffer and arithmetic types exist? Conversions in the buffer types are explicit. Conversions in the arithmetic types are implicit. This fundamental difference is a deliberate design feature that would be lost if the inheritance hierarchy were collapsed. The original design provided only arithmetic types. Buffer types were requested during formal review by those wishing total control over when conversion occurs. They also felt that buffer types would be less likely to be misused by maintenance programmers not familiar with the implications of performing a lot of arithmetic operations on the endian arithmetic types. What is gained by using the buffer types rather than always just using the arithmetic types? Assurance than hidden conversions are not performed. This is of overriding importance to users concerned about achieving the ultimate in terms of speed. "Always just using the arithmetic types" is fine for other users. When the ultimate in speed needs to be ensured, the arithmetic types can be used in the same design patterns or idioms that would be used for buffer types, resulting in the same code being generated for either types. Thanks, --Beman

Rob Stewart

5:13 p.m.

On February 9, 2015 8:40:29 AM EST, Beman Dawes <bdawes@acm.org> wrote:

...

What is gained by using the buffer types rather than always just using the arithmetic types?

Assurance than hidden conversions are not performed. This is of overriding importance to users concerned about achieving the ultimate in terms of speed.

^^^^^^^^^^^^^^^^^^^^^^ "highest speed" would suffice.

...

"Always just using the arithmetic types" is fine for other users. When the ultimate in speed needs to be

You inverted the logic there.

...

ensured, the arithmetic types can be used in the same design patterns or idioms that would be used for buffer types, resulting in the same code being generated for either types.

The latter confuses me a bit. The arithmetic types can, obviously, be used in the same cases as for the buffer types, but the obvious reason to prefer the arithmetic types by default is for the flexibility the implicit conversions and extra operators afford. ___ Rob (Sent from my portable computation engine)

Peter Dimov

24 Jan 24 Jan

4:09 p.m.

...

This approach - as shown - does have a drawback. If you have an array of 802511 32 bit lsb integers in the file, and the native int is 32 bit lsb, one read of 802511*4 bytes is vastly superior in performance to a loop that would read 4 bytes and call read_32_lsb on the byte[4]. Which is why the above is generally combined with

void read_32_lsb_n( int * first, int * last, FILE * fp );

It occurs to me that the library could benefit from similar additions to little_to_native et al, which are currently scalar-only and would, therefore, depend on a Sufficiently Smart(tm) compiler to optimize out the empty calls. Batch endianness fixups are very common and very performance sensitive.

Philip Bennefall

23 Jan 23 Jan

9:26 p.m.

Hi all, This is my first time reviewing a Boost library, so I figured a mini-review might be a good place to start. I did not look at the library during the formal review, so I am approaching it as a new user. I have read through the documentation and parts of the reference, and I think it looks good. The documentation does a good job of explaining the pros and cons of the three usage patterns, and I am happy with the performance measurements. I also looked through the source, and was pleasantly surprised to see the modest dependency list. That's a deal-breaker for me when deciding which Boost libraries to use. One thing I was not quite clear on is, what features are disabled in C++03 mode? Can this be clarified a little? I was happy to find the support for unaligned types with byte sizes other than 1, 2, 4 and 8. This, in combination with dynamic_bitset, would have been very useful when I implemented a MIDI file parser a little while ago. In my view, an endianness library should have been in Boost 10 years ago. I vote for acceptance. Kind regards, Philip Bennefall On 1/23/2015 5:17 PM, Joel FALCOU wrote:

...

The mini-review of Boost.Endian starts today and runs through Sunday, February 1.

The library repository is available on GitHub at https://github.com/boostorg/endian

The docs for the library can be viewed online at https://boostorg.github.io/endian/

The objective of the mini-review is to verify that the issues raised during the formal review have been addressed. Those issues are documented at https://boostorg.github.io/endian/mini_review_topics.html which also describes the resolution of each issue.

Comments and suggestions unrelated to the mini-review issues are also welcome, but the key question the review manager needs your opinion on is this:

Is the library ready to be added to Boost releases?

-- Joel Falcou, Endian Review Manager

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost .

Beman Dawes

25 Jan 25 Jan

3:24 p.m.

On Fri, Jan 23, 2015 at 4:26 PM, Philip Bennefall <philip@blastbay.com> wrote:

...

Hi all,

This is my first time reviewing a Boost library, so I figured a mini-review might be a good place to start.

I did not look at the library during the formal review, so I am approaching it as a new user. I have read through the documentation and parts of the reference, and I think it looks good. The documentation does a good job of explaining the pros and cons of the three usage patterns, and I am happy with the performance measurements. I also looked through the source, and was pleasantly surprised to see the modest dependency list. That's a deal-breaker for me when deciding which Boost libraries to use.

One thing I was not quite clear on is, what features are disabled in C++03 mode? Can this be clarified a little?

New section, added. See https://boostorg.github.io/endian/index.html#C++03-support

...

I was happy to find the support for unaligned types with byte sizes other than 1, 2, 4 and 8. This, in combination with dynamic_bitset, would have been very useful when I implemented a MIDI file parser a little while ago.

In my view, an endianness library should have been in Boost 10 years ago. I vote for acceptance.

Thanks, --Beman

Jason Newton

24 Jan 24 Jan

7:25 a.m.

On Fri, Jan 23, 2015 at 8:17 AM, Joel FALCOU <joel.falcou@gmail.com> wrote:

...

The mini-review of Boost.Endian starts today and runs through Sunday, February 1.

The library repository is available on GitHub at https://github.com/boostorg/endian

The docs for the library can be viewed online at https://boostorg.github.io/endian/

The objective of the mini-review is to verify that the issues raised during the formal review have been addressed. Those issues are documented at https://boostorg.github.io/endian/mini_review_topics.html which also describes the resolution of each issue.

Comments and suggestions unrelated to the mini-review issues are also welcome, but the key question the review manager needs your opinion on is this:

Is the library ready to be added to Boost releases?

-- Joel Falcou, Endian Review Manager

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

As a long time user of the Boost.Endian library (applying patches to the distribution I maintain), I am happy to see it finally up for uh... mini-review. Hoping to see it get to formal review and included... further hoping it makes it into the next C++ standard! It is ready in my opinion. I've seen it go through some topical changes over the years but the only thing I've used from it are the conversion functions which have been great for implementing proper io in the range of network socket io / serial with various protocols and stored data from varies devices/sensors) if a specification enforces byte order. It fullfills my needs and always did fairly well on performance with gcc/clang. I think the only thing that I have no interest in using - a kind way of saying its probably unneeded bloat IMO is the arithmetic types. I prefer native types and using conversion functions. But maybe they could improve my code - its something I toss back and forth a little and haven't attempted embracing. This library should not be looked at as fulfilling an uncommon need - and should even be part of future C++ standards because the usual macro/functions (e.g. htons) in the in arpa/inet.h are widely used outside of inet related code (see above examples) and have the types encoded in the name making them not easily usable/composable in contrast to something template based which the library offers. Further IEEE 754 support - especially for double floating point types. The library definitely fills that gap and has for years IMO. Best Regards, -Jason

Beman Dawes

25 Jan 25 Jan

4:21 p.m.

On Sat, Jan 24, 2015 at 2:25 AM, Jason Newton <nevion@gmail.com> wrote: ...

...

As a long time user of the Boost.Endian library (applying patches to the distribution I maintain), I am happy to see it finally up for uh... mini-review. Hoping to see it get to formal review and included... further hoping it makes it into the next C++ standard!

The library went through formal review in 2011, so will go into Boost as soon as the mini-review is over. And, yes, I do plan to propose it to the standards committee.

...

It is ready in my opinion. I've seen it go through some topical changes over the years but the only thing I've used from it are the conversion functions which have been great for implementing proper io in the range of network socket io / serial with various protocols and stored data from varies devices/sensors) if a specification enforces byte order. It fullfills my needs and always did fairly well on performance with gcc/clang.

Depending on what version you have been using, the performance may now be even better.

...

I think the only thing that I have no interest in using - a kind way of saying its probably unneeded bloat IMO is the arithmetic types. I prefer native types and using conversion functions. But maybe they could improve my code - its something I toss back and forth a little and haven't attempted embracing.

Because even the endian buffer and arithmetic types that appear most expensive generate only short sequences of code, the optimizing compilers often generate exactly the same instructions regardless of endian approach. On Intel machines, that often distills down to a single bswap instruction since the value is already in a register. That isn't always true, so you do need to measure performance of various approaches in the context of your actual application. But if use of the buffer or arithmetic types would otherwise improve your code, please don't reject them without actually testing performance first.

...

This library should not be looked at as fulfilling an uncommon need - and should even be part of future C++ standards because the usual macro/functions (e.g. htons) in the in arpa/inet.h are widely used outside of inet related code (see above examples) and have the types encoded in the name making them not easily usable/composable in contrast to something template based which the library offers. Further IEEE 754 support - especially for double floating point types. The library definitely fills that gap and has for years IMO.

Thanks for the kind words! --Beman

Jason Newton

11:12 p.m.

On Sun, Jan 25, 2015 at 8:21 AM, Beman Dawes <bdawes@acm.org> wrote:

...

On Sat, Jan 24, 2015 at 2:25 AM, Jason Newton <nevion@gmail.com> wrote:

...
I think the only thing that I have no interest in using - a kind way of saying its probably unneeded bloat IMO is the arithmetic types. I prefer native types and using conversion functions. But maybe they could improve my code - its something I toss back and forth a little and haven't attempted embracing.

Because even the endian buffer and arithmetic types that appear most expensive generate only short sequences of code, the optimizing compilers often generate exactly the same instructions regardless of endian approach. On Intel machines, that often distills down to a single bswap instruction since the value is already in a register.

That isn't always true, so you do need to measure performance of various approaches in the context of your actual application. But if use of the buffer or arithmetic types would otherwise improve your code, please don't reject them without actually testing performance first.

Allow me to clarify: I meant code bloat, not speed. -Jason

Beman Dawes

26 Jan 26 Jan

1:03 p.m.

On Sun, Jan 25, 2015 at 6:12 PM, Jason Newton <nevion@gmail.com> wrote:

...

On Sun, Jan 25, 2015 at 8:21 AM, Beman Dawes <bdawes@acm.org> wrote:

...
On Sat, Jan 24, 2015 at 2:25 AM, Jason Newton <nevion@gmail.com> wrote:

...
I think the only thing that I have no interest in using - a kind way of saying its probably unneeded bloat IMO is the arithmetic types. I prefer native types and using conversion functions. But maybe they could improve my code - its something I toss back and forth a little and haven't attempted embracing.

Because even the endian buffer and arithmetic types that appear most expensive generate only short sequences of code, the optimizing compilers often generate exactly the same instructions regardless of endian approach. On Intel machines, that often distills down to a single bswap instruction since the value is already in a register.

That isn't always true, so you do need to measure performance of various approaches in the context of your actual application. But if use of the buffer or arithmetic types would otherwise improve your code, please don't reject them without actually testing performance first.

Allow me to clarify: I meant code bloat, not speed.

Sorry - your original post was clear, but I misinterpreted. I have not systematically measured code bloat. Maybe I can do that at least for some common use cases. Thanks, --Beman

Jeremy Maitin-Shepard

2:19 a.m.

I have made use of this library over the past 6 months. It has served my needs very well, and I think it is definitely ready to be added to Boost. I have a few comments, though: conditional_reverse and conditional_reverse_inplace should default the second order value to native (for both the compile-time order and runtime order versions). Conversions are almost always to/from native endianness, and it is annoying to have to specify that explicitly. The supplied endian_arithmetic and endian_buffer templates are also a bit inconvenient to use due to the need to specify both the integer type and the number of bits. It would be more convenient to be able to just specify the number of bits and have the integer type be determined automatically. I suppose there is the issue of needing to choose between an exact type and a possible faster type. For both the buffer and arithmetic types, if the aligned types are to be kept, the user needs to make an explicit choice about alignment. Making either aligned or unaligned the effective default, by giving it the big_uint32_t name, rather than the more explicit big_uint32_ut/big_uint32_at types, makes it easier for the user to overlook alignment issues. Therefore I suggest that the _t names be removed in favor of the more explicit _at and _ut names. Potentially, the alignment amount could also be specified as a template argument rather than be platform-dependent. The correct usage of aligned generally seems rather tricky. Since alignment is platform-specific, it seems there is high risk of inadvertantly introducing padding where it doesn't belong. It also seems there is a high risk of the alignment constraints being violated when some byte buffer is cast to the struct containing endian buffer/arithmetic types. It seems about the only guaranteed safe use of the aligned types is if a single endian type is declared as a local variable. There is also the fact that on x86/x86-64, there are no separate instructions for aligned vs. unaligned data, the code just runs a bit faster if the data is aligned. I think the Endian library partially takes advantage of this (e.g. via the patch I submitted), though there may be some places in which it does not take advantage of this. The benefit of using aligned types is particularly small, therefore, on x86. It would be nice if there were a way of declaring a "preferred" alignment, that will be used if possible for automatic variables, but will never result in padding being added. Unfortunately I don't know of any way to get that behavior.

Beman Dawes

2:39 p.m.

On Sun, Jan 25, 2015 at 9:19 PM, Jeremy Maitin-Shepard <jeremy@jeremyms.com> wrote:

...

I have made use of this library over the past 6 months. It has served my needs very well, and I think it is definitely ready to be added to Boost.

I have a few comments, though:

conditional_reverse and conditional_reverse_inplace should default the second order value to native (for both the compile-time order and runtime order versions). Conversions are almost always to/from native endianness, and it is annoying to have to specify that explicitly.

Makes sense. Will give it a try.

...

The supplied endian_arithmetic and endian_buffer templates are also a bit inconvenient to use due to the need to specify both the integer type and the number of bits. It would be more convenient to be able to just specify the number of bits and have the integer type be determined automatically. I suppose there is the issue of needing to choose between an exact type and a possible faster type.

My expectation is that the class templates will only be used directly is in highly generic code where specifying the integer type is not a problem. I also worry about introducing change to these critical templates at this stage in the process.

...

For both the buffer and arithmetic types, if the aligned types are to be kept, the user needs to make an explicit choice about alignment. Making either aligned or unaligned the effective default, by giving it the big_uint32_t name, rather than the more explicit big_uint32_ut/big_uint32_at types, makes it easier for the user to overlook alignment issues. Therefore I suggest that the _t names be removed in favor of the more explicit _at and _ut names.

That's an interesting thought. Alignment is a critical choice, so requiring it be explicit might be doing users a favor. What do others think?

...

Potentially, the alignment amount could also be specified as a template argument rather than be platform-dependent.

I'd rather not introduce additional complexity for something that is a potential rather than demonstrated need.

...

The correct usage of aligned generally seems rather tricky. Since alignment is platform-specific, it seems there is high risk of inadvertantly introducing padding where it doesn't belong.

Agreed. Docs changed to recommend static_asserting that sizeof the enclosing structure is correct.

...

It also seems there is a high risk of the alignment constraints being violated when some byte buffer is cast to the struct containing endian buffer/arithmetic types. It seems about the only guaranteed safe use of the aligned types is if a single endian type is declared as a local variable.

In isolation, perhaps. But when coupled with software engineering practices like use of asserts and static asserts, aligned types are safe.

...

There is also the fact that on x86/x86-64, there are no separate instructions for aligned vs. unaligned data, the code just runs a bit faster if the data is aligned. I think the Endian library partially takes advantage of this (e.g. via the patch I submitted),

Yes, your patches have been applied:-)

...

though there may be some places in which it does not take advantage of this.

Possibly. Any additional patches would be welcomed.

...

The benefit of using aligned types is particularly small, therefore, on x86. It would be nice if there were a way of declaring a "preferred" alignment, that will be used if possible for automatic variables, but will never result in padding being added. Unfortunately I don't know of any way to get that behavior.

I'm afraid we are getting into an area of diminishing returns, although it is hard to know for sure without a specific proposal. My feeling is that the library is "good enough", and that future effort is best spent on a proposal for standardizing it:-) Thanks, --Beman

Peter Dimov

3:11 p.m.

Beman Dawes wrote:

...

My expectation is that the class templates will only be used directly is in highly generic code where specifying the integer type is not a problem.

Your expectation is incorrect. Some file formats support both little and big endian variants (to make it easier for writers, at readers' expense). It's typical to do read( signature ); if( signature == 0x12345678 ) { read_header<little_endian>( h ); } else if( signature == 0x78563412 ) { read_header<big_endian>( h ); } else { // not a valid file } which is "lowly generic". That said, I support your decision to have both the type and the size as template parameters; as I demonstrated in my example, the type is decided by the program, and the size is decided by the file format, so in general, the two do not necessarily match. However, this allows me to sneak in another comment. Currently, if I try to put 0x123456 into a 16 bit buffer, it silently cuts it to 0x3456 without missing a beat. This is sometimes, but very rarely, what we want. Typically, what one wants in this case is an exception, because this usually happens when writing a file, and if one tries to write 0x123456 and only 0x3456 comes out, it's likely that the result will be corrupted and unreadable.

Beman Dawes

9 p.m.

On Mon, Jan 26, 2015 at 10:11 AM, Peter Dimov <lists@pdimov.com> wrote: ...

...

However, this allows me to sneak in another comment. Currently, if I try to put 0x123456 into a 16 bit buffer, it silently cuts it to 0x3456 without missing a beat. This is sometimes, but very rarely, what we want. Typically, what one wants in this case is an exception, because this usually happens when writing a file, and if one tries to write 0x123456 and only 0x3456 comes out, it's likely that the result will be corrupted and unreadable.

Good idea! Perhaps an error checking policy could be added as a template parameter, with two supplied - one a nop, the other throwing an exception on overflow. I'll document that somewhere as a high priority for the future. --Beman

Peter Dimov

9:42 p.m.

Beman Dawes wrote:

...

Perhaps an error checking policy could be added as a template parameter, with two supplied - one a nop, the other throwing an exception on overflow.

Given that in many, if not most, cases, the check is a no-op - when the value type and the bits match - it might instead make sense to make the methods conditionally noexcept when the bits are enough to store everything in range, and always throw on overflow otherwise.

Olaf van der Spek

9:49 p.m.

On Mon, Jan 26, 2015 at 10:42 PM, Peter Dimov <lists@pdimov.com> wrote:

...

Beman Dawes wrote:

...
Perhaps an error checking policy could be added as a template parameter, with two supplied - one a nop, the other throwing an exception on overflow.

Given that in many, if not most, cases, the check is a no-op - when the value type and the bits match - it might instead make sense to make the methods conditionally noexcept when the bits are enough to store everything in range, and always throw on overflow otherwise.

Isn't overflow more like a logic error than a runtime error? An assert might make more sense. -- Olaf

Peter Dimov

10:14 p.m.

Olaf van der Spek wrote:

...

Isn't overflow more like a logic error than a runtime error? An assert might make more sense.

It typically happens in a portion of the code that can already throw because it deals with output to file. And if it were an assert, what would you do to avoid it? The same the library would do, check the value and throw if it doesn't fit. So the library just does for you what you'd have done anyway, as a convenience. As a more practical matter, users don't tend to like asserts when saving, even if this would have been the theoretically sound thing to do. :-) (For the reverse operation, where the bits are more than the value type can represent, assert doesn't seem to make sense either because we don't control the bits, which typically come from an external source.) In all this I'm assuming the buffer types, BTW. I'm not sure if the same logic would apply to the arithmetic types, because they have very different use cases. Buffers are relatively straightforward - the bits are read from file and the value is extracted, or the value is stored and the bits are written - so it's easier to reason about the context in which they are used. Arithmetics should probably just do on overflow whatever the underlying type does.

Olaf van der Spek

27 Jan 27 Jan

8:41 a.m.

On Mon, Jan 26, 2015 at 11:14 PM, Peter Dimov <lists@pdimov.com> wrote:

...

Olaf van der Spek wrote:

...
Isn't overflow more like a logic error than a runtime error? An assert might make more sense.

It typically happens in a portion of the code that can already throw because it deals with output to file. And if it were an assert, what would you do to avoid it?

On input ensure the destination (type) is large enough to hold all values and on output ensure the value to be written isn't too large.

...

The same the library would do, check the value and throw if it doesn't fit. So the library just does for you what you'd have done anyway, as a convenience.

As a more practical matter, users don't tend to like asserts when saving, even if this would have been the theoretically sound thing to do. :-)

Users don't like bugs in general.

...

(For the reverse operation, where the bits are more than the value type can represent, assert doesn't seem to make sense either because we don't control the bits, which typically come from an external source.)

But we do control the destination type.

...

In all this I'm assuming the buffer types, BTW. I'm not sure if the same logic would apply to the arithmetic types, because they have very different use cases. Buffers are relatively straightforward - the bits are read from file and the value is extracted, or the value is stored and the bits are written - so it's easier to reason about the context in which they are used. Arithmetics should probably just do on overflow whatever the underlying type does.

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

-- Olaf

Cliff Green

4:25 a.m.

...

Is the library ready to be added to Boost releases?

Yes. I have used Beman's Endian library from 2011 extensively and have been completely satisfied by it. This newer version provides the original functionality plus adds extensive new useful functionality (and will allow me to replace some of my "homegrown" endian utility code). However, I do have concern about one aspect of the new functionality (or maybe I didn't read deep enough in the docs) - floating point endian reversal functions which return by value. Unless something has significantly changed in the last few years, swapping and returning floating point values is likely to silently change bits for certain bit patterns / values. In particular, returning by value will normalize some values (I assume as part of loading or accessing the value into floating point CPU registers). There may be other floating point characteristics that silently change the bits dealing with NaN and infinity values (but I'm far from an expert on floating point computations). I've never seen the normalization happen for in-place swapping of floating point values, only when returning by value. This means that, for certain values of x, where x is floating point: endian_reverse(endian_reverse(x)) != x. (Note that for integers, endian_reverse(endian_reverse(x)) == x, for all possible values of x.) Maybe I missed something in the docs where the normalization is turned off, or an exception thrown, or somehow this situation is addressed (and again, it's only for the conversion functions returning by value). Some of the Boost floating point / computation experts might want to chime in. I've worked in multiple projects that blithely swapped floating point values in various distributed processing environments, rarely paying attention to these kinds of issues (as well as more obvious issues, like whether all systems were IEEE 754 or not), and most of the time they were "lucky", happening to write home-grown code that swapped in place and was careful not to move the swapped floating point values out of buffers before reading or writing. I always mentioned this code was brittle and to pay attention or document the brittleness. Cliff --- This email has been checked for viruses by Avast antivirus software. http://www.avast.com

Beman Dawes

1:49 p.m.

On Mon, Jan 26, 2015 at 11:25 PM, Cliff Green <cliffg@codewrangler.net> wrote:

...

...
Is the library ready to be added to Boost releases?

Yes.

I have used Beman's Endian library from 2011 extensively and have been completely satisfied by it. This newer version provides the original functionality plus adds extensive new useful functionality (and will allow me to replace some of my "homegrown" endian utility code).

However, I do have concern about one aspect of the new functionality (or maybe I didn't read deep enough in the docs) - floating point endian reversal functions which return by value.

Unless something has significantly changed in the last few years, swapping and returning floating point values is likely to silently change bits for certain bit patterns / values. In particular, returning by value will normalize some values (I assume as part of loading or accessing the value into floating point CPU registers). There may be other floating point characteristics that silently change the bits dealing with NaN and infinity values (but I'm far from an expert on floating point computations).

I've never seen the normalization happen for in-place swapping of floating point values, only when returning by value.

This means that, for certain values of x, where x is floating point: endian_reverse(endian_reverse(x)) != x. (Note that for integers, endian_reverse(endian_reverse(x)) == x, for all possible values of x.)

Maybe I missed something in the docs where the normalization is turned off, or an exception thrown, or somehow this situation is addressed (and again, it's only for the conversion functions returning by value). Some of the Boost floating point / computation experts might want to chime in.

I've worked in multiple projects that blithely swapped floating point values in various distributed processing environments, rarely paying attention to these kinds of issues (as well as more obvious issues, like whether all systems were IEEE 754 or not), and most of the time they were "lucky", happening to write home-grown code that swapped in place and was careful not to move the swapped floating point values out of buffers before reading or writing. I always mentioned this code was brittle and to pay attention or document the brittleness.

Cliff, Thank you very much for this post! You have identified a serious weakness. The underlying problem is that I have virtually no experience with floating-point issues. So if you or others could help improve endian floating point support, it will be greatly appreciated. Here are some areas where help is needed: * Recommendations for changes to the library interface for floating point. For example, eliminating the return by value functions. * Test cases that probe for errors you have seen in the past or that you worry about. * Suggestions about warnings or other changes you would like to see made to the docs. * Should the implementation refuse to compile (i.e. #error) on no-IEEE 754 systems? If so, how to implement? * Anything else that you can think of to improve FP support. Thanks, --Beman

Paul A. Bristow

30 Jan 30 Jan

10:25 a.m.

...

-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Beman Dawes Sent: 27 January 2015 13:50 To: Boost Developers List Subject: Re: [boost] [Review] Boost.Endian mini-review

On Mon, Jan 26, 2015 at 11:25 PM, Cliff Green <cliffg@codewrangler.net> wrote:

...
...
Is the library ready to be added to Boost releases?

Yes.

I have used Beman's Endian library from 2011 extensively and have been completely satisfied by it. This newer version provides the original functionality plus adds extensive new useful functionality (and will allow me to replace some of my "homegrown" endian utility code).

However, I do have concern about one aspect of the new functionality (or maybe I didn't read deep enough in the docs) - floating point endian reversal functions which return by value.

Unless something has significantly changed in the last few years, swapping and returning floating point values is likely to silently change bits for certain bit patterns / values. In particular, returning by value will normalize some values (I assume as part of loading or accessing the value into floating point CPU registers). There may be other floating point characteristics that silently change the bits dealing with NaN and infinity values (but I'm far from an expert on floating point computations).

I've never seen the normalization happen for in-place swapping of floating point values, only when returning by value.

This means that, for certain values of x, where x is floating point: endian_reverse(endian_reverse(x)) != x. (Note that for integers, endian_reverse(endian_reverse(x)) == x, for all possible values of x.)

Maybe I missed something in the docs where the normalization is turned off, or an exception thrown, or somehow this situation is addressed (and again, it's only for the conversion functions returning by value). Some of the Boost floating point / computation experts might want to chime in.

I've worked in multiple projects that blithely swapped floating point values in various distributed processing environments, rarely paying attention to these kinds of issues (as well as more obvious issues, like whether all systems were IEEE 754 or not), and most of the time they were "lucky", happening to write home-grown code that swapped in place and was careful not to move the swapped floating point values out of buffers before reading or writing. I always mentioned this code was brittle and to pay attention or document the brittleness.

Cliff,

Thank you very much for this post! You have identified a serious weakness.

The underlying problem is that I have virtually no experience with floating-point issues. So if you or others could help improve endian floating point support, it will be greatly appreciated.

Here are some areas where help is needed:

* Recommendations for changes to the library interface for floating point. For example, eliminating the return by value functions. * Test cases that probe for errors you have seen in the past or that you worry about. * Suggestions about warnings or other changes you would like to see made to the docs. * Should the implementation refuse to compile (i.e. #error) on no-IEEE 754 systems? If so, how to implement? * Anything else that you can think of to improve FP support.

The proof of the pudding is in the eating. I'd suggest that *testing interoperability* is the key precaution that you should advise. It should obviously be between all possible combinations of processor hardware types. The test should include a range of floating-point values from min to max, and include NaN and infinity unless it is certain that they will never occur. (Unless very special precautions are taken, then infinity and NaN are like to rear their ugly heads at some time - probably when you least want to see them!) It must be possible to 'round-trip' all floating-point values. You can't test all possible values (except for 32-bit float - 64-bit double takes about 50 years at current processor speeds ;-) but you can chose values (floating-point bit patterns) at random for a test lasting some minutes or even hours. (This was the only way that I found to check for failures in Microsoft cout - cin round-tripping (a feature not a bug) that are relevant to serialization). Paul --- Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 (0) 1539 561830

Cliff Green

5:17 p.m.

...

...
... I'd suggest that *testing interoperability* is the key precaution that you should advise.

It should obviously be between all possible combinations of processor hardware types. The test should include a range of floating-point values from min to max, and include NaN and infinity unless it is certain that they will never occur. (Unless very special precautions are taken, then infinity and NaN are like to rear their ugly heads at some time - probably when you least want to see them!) It must be possible to 'round-trip' all floating-point values. You can't test all possible values (except for 32-bit float - 64-bit double takes about 50 years at current processor speeds ;-) but you can chose values (floating-point bit patterns) at random for a test lasting some minutes or even hours. << Thanks, Paul. I was going to start another discussion thread asking for suggestions from some of the experts (or at least people more expert than me) dealing with floating point endian swapping, but I'll wait to see if we get responses on this thread. I'm still not sure what to suggest to Beman for the Endian library wrt to float and double API's, specially as this library could or should be the basis for a standardization proposal. Integers are completely safe and well-defined for swapping - all bit patterns form valid integers and swapping bytes never result in "special" hardware processing. This is not the case for floating point. I have definitely seen "in-place" floating point swapping used extensively without problems, and have definitely seen other functionality have problems (my previously mentioned "returning swapped floating point by value causes silent changing of bits" normalization issue). And, of course, there's the additional problem of binary floating point interoperability between IEEE-754 processors and those that are not. Boost.Endian does not (and should not) define wire protocols. It provides functionality and API's for translating to and from wire protocols. But with floating point, any kind of standardization will have to not only define the API, but some form of wire protocol. Then the Endian functions become something like: opaque_binary_type native_to_interoperate_fp(float x); // opaque_binary_type stores fp elements - sign, mantissa, exponent, etc opaque_binary_type native_to_interoperate_fp(double x); float fp_interoperate_to_native(opaque_binary_type x); double fp_interoperate_to_native(opaque_binary_type x); Unless there is something I'm missing. Here's my initial (and incomplete) answers to Beman's queries to my e-mail: * Recommendations for changes to the library interface for floating point. For example, eliminating the return by value functions. Eliminating the floating point "return by value" function should definitely should be performed. Whether the "in place" swap should be removed I'd like to hear from others more expert. Specifically, with common hardware architectures, what are the operations that will invoke floating point normalization (and other "bit changing" functionality)? What is safe to perform? * Test cases that probe for errors you have seen in the past or that you worry about. Paul gave a pretty good summary in his reply about the general testing strategy. I don't have specific test cases, so maybe others more expert can chime in with specific test logic. * Suggestions about warnings or other changes you would like to see made to the docs. Definitely some kind of summary of these warnings and "brittleness" should be included in the docs. Probably no more than a paragraph or two of general "watch out". I'll be happy to help. If we get responses from people more expert at the hardware instruction level (i.e. what is really going on "under the covers") summaries of those discussions can be included. * Should the implementation refuse to compile (i.e. #error) on no-IEEE 754 systems? If so, how to implement? Not sure - if we're not defining a floating point wire protocol, then it doesn't matter what the underlying bit pattern of a swapped floating value is. It's up to the application to know what's "on the wire" and whether it can be safely unswapped back to native float or double (just like it's up the application to know the underlying integer wire protocol). * Anything else that you can think of to improve FP support. Hopefully this is a good start and others will chime in and you'll have a useful and safe (even if not 100% complete) initial API and implementation. Cliff --- This email has been checked for viruses by Avast antivirus software. http://www.avast.com

Rob Stewart

6:51 p.m.

On January 30, 2015 12:17:21 PM EST, Cliff Green <cliffg@codewrangler.net> wrote:

...

...
...
... I'd suggest that *testing interoperability* is the key precaution that you should advise.

[snip]

...

I was going to start another discussion thread asking for suggestions from some of the experts (or at least people more expert than me) dealing with floating point endian swapping, but I'll wait to see if we get responses on this thread.

Another thread would be wise. The change in subject will attract attention, and the new thread will surface even if someone has ignored this one. ___ Rob (Sent from my portable computation engine)

Gottlob Frege

31 Jan 31 Jan

5:42 a.m.

On Fri, Jan 30, 2015 at 12:17 PM, Cliff Green <cliffg@codewrangler.net> wrote:

...

I'm still not sure what to suggest to Beman for the Endian library wrt to float and double API's, specially as this library could or should be the basis for a standardization proposal. Integers are completely safe and well-defined for swapping - all bit patterns form valid integers and swapping bytes never result in "special" hardware processing.

I don't think that it actually guaranteed (it is only guaranteed for unsigned char). It is fine for 2s-compliment, but for endian in the standard, other integer representations need to be taken into account. Tony

Stefan Seefeld

27 Jan 27 Jan

4:46 p.m.

New subject: boost @ github (was: Re: [Review] Boost.Endian mini-review)

Hi there, sorry to hijack this discussion, but trying to view at the docs I had some generic questions: On 23/01/15 11:17 AM, Joel FALCOU wrote:

...

The docs for the library can be viewed online at https://boostorg.github.io/endian/

Are there plans to make all of the boost docs / website available through https://boostorg.github.io ? At present https://boostorg.github.io/endian/ appears to work mostly fine, though the logo is missing, and the toplevel link to https://boostorg.github.io/index.html yields a 404 error. Is that supposed o work ? Are library authors required to do anything to generate those docs, and is that documented anywhere ? (I'm looking at https://svn.boost.org/trac/boost/wiki/ModularBoost but find that to be lacking a bit behind, so it's hard to figure out what actually works and what is only planned. ) Thanks, Stefan -- ...ich hab' noch einen Koffer in Berlin...

Beman Dawes

6:28 p.m.

New subject: boost @ github (was: Re: [Review] Boost.Endian mini-review)

On Tue, Jan 27, 2015 at 11:46 AM, Stefan Seefeld <stefan@seefeld.name> wrote:

...

Hi there,

sorry to hijack this discussion, but trying to view at the docs I had some generic questions:

On 23/01/15 11:17 AM, Joel FALCOU wrote:

...
The docs for the library can be viewed online at https://boostorg.github.io/endian/

Are there plans to make all of the boost docs / website available through https://boostorg.github.io ? At present https://boostorg.github.io/endian/ appears to work mostly fine, though the logo is missing, and the toplevel link to https://boostorg.github.io/index.html yields a 404 error. Is that supposed o work ? Are library authors required to do anything to generate those docs, and is that documented anywhere ?

(I'm looking at https://svn.boost.org/trac/boost/wiki/ModularBoost but find that to be lacking a bit behind, so it's hard to figure out what actually works and what is only planned. )

Thanks, Stefan

--

...ich hab' noch einen Koffer in Berlin...

_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Beman Dawes

6:39 p.m.

New subject: boost @ github (was: Re: [Review] Boost.Endian mini-review)

On Tue, Jan 27, 2015 at 11:46 AM, Stefan Seefeld <stefan@seefeld.name> wrote:

...

Hi there,

sorry to hijack this discussion, but trying to view at the docs I had some generic questions:

On 23/01/15 11:17 AM, Joel FALCOU wrote:

...
The docs for the library can be viewed online at https://boostorg.github.io/endian/

Are there plans to make all of the boost docs / website available through https://boostorg.github.io ?

Not that I know of. It is convenient to have a library's docs online for mini-reviews, and I used GitHub to do that because it is so convenient.

...

At present https://boostorg.github.io/endian/ appears to work mostly fine, though the logo is missing, and the toplevel link to https://boostorg.github.io/index.html yields a 404 error. Is that supposed o work?

No, since those are both links to the super project, and for purposes of a mini-review they were not necessary.

...

Are library authors required to do anything to generate those docs, and is that documented anywhere ?

No, and no. I sometimes publish master docs and sometimes develop docs, according to current needs.

...

(I'm looking at https://svn.boost.org/trac/boost/wiki/ModularBoost but find that to be lacking a bit behind, so it's hard to figure out what actually works and what is only planned. )

If we do ever require it, those pages will get updated. Cheers, --Beman

Rob Stewart

29 Jan 29 Jan

6:53 p.m.

On January 23, 2015 11:17:25 AM EST, Joel FALCOU <joel.falcou@gmail.com> wrote:

...

The mini-review of Boost.Endian starts today and runs through Sunday, February 1.

The docs for the library can be viewed online at https://boostorg.github.io/endian/

I found a few issues on the docs: * "an class" * "may improved efficiency" * " Remarks: Meet the EndianReversible requirements", for endian_reverse() should, I assume, refer to "x" * should there be a note suggesting that the following be constexpr, when possible? template <class EndianReversible> EndianReversible conditional_reverse(EndianReversible x, order order1, order order2) noexcept; * The effects documented for endian_reverse_inplace() don't allow for specialization. The code type is much smaller than the surrounding text. This is particularly odd, besides annoying, on the specifications section in which the effects, remarks, etc. Are huge compared to the function declaration to which they relate. I ___ Rob (Sent from my portable computation engine)

Beman Dawes

10 Feb 10 Feb

12:39 a.m.

On Thu, Jan 29, 2015 at 1:53 PM, Rob Stewart <rob.stewart@verizon.net> wrote:

...

On January 23, 2015 11:17:25 AM EST, Joel FALCOU <joel.falcou@gmail.com> wrote:

...
The mini-review of Boost.Endian starts today and runs through Sunday, February 1.

The docs for the library can be viewed online at https://boostorg.github.io/endian/

I found a few issues on the docs:

* "an class"

Fixed.

...

* "may improved efficiency"

Fixed.

...

* " Remarks: Meet the EndianReversible requirements", for endian_reverse() should, I assume, refer to "x"

Fixed.

...

* should there be a note suggesting that the following be constexpr, when possible?

template <class EndianReversible> EndianReversible conditional_reverse(EndianReversible x, order order1, order order2) noexcept;

I'm leery of that. The user would not be able to depend on a given function being constexpr because it would change according to the endianness of the platform. I could be wrong about that, and I'm not sure if users would get in trouble even if it were correct. The C++ committee is also considering similar scenarios for constexpr, and the point is far from settled, even between people who understand constexpr better than I do.

...

* The effects documented for endian_reverse_inplace() don't allow for specialization.

I'll deal with that separately in a day or two.

...

The code type is much smaller than the surrounding text. This is particularly odd, besides annoying, on the specifications section in which the effects, remarks, etc. Are huge compared to the function declaration to which they relate.

I'll try to remove the font-size from the css, and see if that helps. I currently have it set at 85% because that looks good on my development machine under Firefox. I checked firefox, chrome, safari, and IE, using Windows, Linux, OS X, and Android. While a few of those combinations produced <code>...</code> a bit on the small side, it was never "much smaller". It will be tomorrow before I push these changes up to GitHub. Thanks, --Beman

Beman Dawes

11 Feb 11 Feb

3:24 p.m.

On Thu, Jan 29, 2015 at 1:53 PM, Rob Stewart <rob.stewart@verizon.net> wrote:

...

The code type is much smaller than the surrounding text. This is particularly odd, besides annoying, on the specifications section in which the effects, remarks, etc. Are huge compared to the function declaration to which they relate.

An update to the CSS has just been pushed to https://boostorg.github.io/endian. It makes the presentation of text more uniform across browsers, but the only smartphone I have is android based, so I couldn't test IOS. On android the Chrome presentation is excellent, while the Firefox presentation is perfectly usable although not quite as pretty. Also tested various browsers on Windows, Linux, and OS X, and all the results are adequate to excellent. Safari does a particularly pleasing rendering on my macbook pro retina display, although chrome is pretty good to. Firefox is adequate. Thanks, --Beman

Rob Stewart

12 Feb 12 Feb

1:42 p.m.

On February 11, 2015 10:24:44 AM EST, Beman Dawes <bdawes@acm.org> wrote:

...

On Thu, Jan 29, 2015 at 1:53 PM, Rob Stewart <rob.stewart@verizon.net> wrote:

...
The code type is much smaller than the surrounding text. This is particularly odd, besides annoying, on the specifications section in which the effects, remarks, etc. Are huge compared to the function declaration to which they relate.

An update to the CSS has just been pushed to https://boostorg.github.io/endian. It makes the presentation of text more uniform across browsers, but the only smartphone I have is android based, so I couldn't test IOS. On android the Chrome presentation is excellent, while the Firefox presentation is perfectly usable although not quite as pretty.

With Chrome on Android, it looks very nice except that the code often runs beyond the right edge of the blue background. With Firefox on Android, the display has improved, but the code is half the height of the prose, which is still hard to read. Changing font size in the browser has no effect. That's unhelpful for those with poor eyesight or large screens, etc. as they are denied control.

...

Also tested various browsers on Windows, Linux, and OS X, and all the results are adequate to excellent. Safari does a particularly pleasing rendering on my macbook pro retina display, although chrome is pretty good to. Firefox is adequate.

With Firefox on Windows, the text runs down the middle and extends about half the window width, and the code overruns the blue background, but it is quite readable. The text size does change as I change the browser font size, unlike on Android. If only they all worked alike. ___ Rob (Sent from my portable computation engine)

Rob Stewart

1:47 p.m.

[I found this languishing in my drafts folder. Sorry it's late.] arithmetic.html "operators are +, -, ~, !, prefix and postfix -- and ++" has too many "and"s. Try, "operators are +, -, ~, !, plus both prefix and postfix -- and ++". The binary arithmetic and relational operators lists really should have "and" before the final operator in each series. Comments I made on buffer.html, like those on the warning and note after the typedef table apply to this page. In, "An endian is an integer byte-holder", I presume that "endian" should be "endian_arithmetic". (If not, then s/endian_arithmetic/endian/ throughout the page. The same applies to the synopsis. The two are used variously. The effects for the default constructor say nothing about the value. If it is uninitialized, a note to that effect would be helpful. ___ Rob (Sent from my portable computation engine)

Bjorn Reese

1 Feb 1 Feb

1:33 p.m.

New subject: [Boost-users] [Review] Boost.Endian mini-review

On 01/23/2015 05:17 PM, Joel FALCOU wrote:

...

Is the library ready to be added to Boost releases?

Yes. The documentation is well-written, the code looks clean (except for the multitude of #ifdefs), and the API is superior to N3783. Having said that, I do have a few comments: The endian_reverse() functions assume that there are only two types of endianness (big and little) and that they are the reverse of each other. In practice, this is a feasible assumption. However, once in a while, an odd endianness does show up in this millennia. For example, ARM11 processors uses mix-endian [1] for double. Byte reversal will not handle such cases correctly. The could become a problem during C++ standardization. Why are there no *_int8_t types? [1] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0338g/I1844.h...

Niall Douglas

3:20 p.m.

New subject: [Boost-users] [Review] Boost.Endian mini-review

On 1 Feb 2015 at 14:33, Bjorn Reese wrote:

...

The endian_reverse() functions assume that there are only two types of endianness (big and little) and that they are the reverse of each other. In practice, this is a feasible assumption. However, once in a while, an odd endianness does show up in this millennia. For example, ARM11 processors uses mix-endian [1] for double. Byte reversal will not handle such cases correctly. The could become a problem during C++ standardization.

Another big problem for standardisation is that Endian only supports CHAR_BIT == 8. I recently had a pull request come through for my nedtries library adding CHAR_BIT = 10 support. I guess 10 bit per byte CPUs are still common enough, and a bit more surprisingly they can compile modern libraries. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/

Beman Dawes

11 Feb 11 Feb

3:03 p.m.

New subject: [Boost-users] [Review] Boost.Endian mini-review

On Sun, Feb 1, 2015 at 10:20 AM, Niall Douglas <s_sourceforge@nedprod.com> wrote:

...

On 1 Feb 2015 at 14:33, Bjorn Reese wrote:

...
The endian_reverse() functions assume that there are only two types of endianness (big and little) and that they are the reverse of each other. In practice, this is a feasible assumption. However, once in a while, an odd endianness does show up in this millennia. For example, ARM11 processors uses mix-endian [1] for double. Byte reversal will not handle such cases correctly. The could become a problem during C++ standardization.

Another big problem for standardisation is that Endian only supports CHAR_BIT == 8. I recently had a pull request come through for my nedtries library adding CHAR_BIT = 10 support. I guess 10 bit per byte CPUs are still common enough, and a bit more surprisingly they can compile modern libraries.

Actually, the library specification for the endian types does support CHAR_BITs other than 8. Likewise built-in integers with sizes other than 8, 16, 32, and 64-bits. Ones complement integers should also work, and probably sign-magnitude too. The current implementation has only been tested on twos-complement 8-bit byte machines with 8, 16, 32, and 64-bit built-in integers. That's why it asserts on CHAR_BIT == 8. Various optimizations may have to be disabled before the code would work correctly. The C++ committee's library working group is OK with conversion functions with various limitations, and had included some in the Library Fundamentals TS 1. The only reason they got pulled was because on many platforms the names were already in use as macros, and there wasn't time to come up with a new set of names. Thanks, --Beman

Rob Stewart

2 Feb 2 Feb

12:11 a.m.

enum class align is overkill. I'd prefer an ordinary enum: enum assignment { aligned, unaligned }; Usage, in endian_buffer for example, would be a little more straightforward: template < order Order , typename T , std::size_t Nbits , alignment Align = unaligned

...

class endian_buffer;

More doc comments.... index.html "and the least-significant-byte-first is traditionally called": omit "the" our add it to the beginning of the sentence. s/release builds/optimized builds/ (the former is colloquial) s/Useful when intrinsic headers such as byteswap.h are not being found by your compiler/This is useful when a compiler cannot find out has no intrinsic support or fails to locate the appropriate header/ s/Useful for eliminating missing intrinsics/This is useful for eliminating missing intrinsics/ The answer to the FAQ, "Does endianness have any uses outside of portable binary file or network I/O formats?", is unsatisfying. Several times already you've mentioned the same idea, but nowhere do you explain what you mean. How does the library provide that minor benefit? In the FAQ, "Why bother with binary I/O?", s/direct access/random access/. In the same FAQ, the sentence starting with "Disadvantages" should be a separate paragraph. On the FAQ, "Which is better, big-endian or little-endian?", s/x64/x86-64/. s/Systems where integer endianness differs/Systems on which integer endianness differs/ (systems aren't places) buffer.html s/See the Wikipedia/See Wikipedia/ reads better to me s/containing four byte big-endian/containing four-byte, big-endian/ In the GIS example, I suggest sizeof(h) vs. sizeof(header) as a general means to protect against type changes. Boost should set a good example. s/do layout memory/do lay out memory/ s/base classes will no longer disqualify/base classes no longer disqualify/ s/from being a POD/from being POD/ (s/POD/POD type/ would also work) "alignment of an 64-bit integer may be to a 32-bit boundary on a 32-bit machine" would be more helpful if you complete the thought: "and to a 64-bit boundary on a 64-bit machine." The note, "One-byte big and little buffer types have identical functionality", might be better as, "One-byte big and little buffer types are identical on all platforms." The endian_buffer default constructor should have a note saying that the value is unspecified. (I'm assuming the value is not zero-initialized.) s/endianness conversion if required is performed/endianness conversion, if required, is performed/ s/set to one../set to one./ If prefer s/POD's/PODs/ "This is ensures that , and so" needs some work. ___ Rob (Sent from my portable computation engine)

Beman Dawes

18 Feb 18 Feb

12:36 p.m.

On Sun, Feb 1, 2015 at 7:11 PM, Rob Stewart <rob.stewart@verizon.net> wrote:

...

enum class align is overkill. I'd prefer an ordinary enum:

enum assignment { aligned, unaligned };

Usage, in endian_buffer for example, would be a little more straightforward:

template < order Order , typename T , std::size_t Nbits , alignment Align = unaligned

...
class endian_buffer;

I have a mild preference for the current scoped enum formulation, so am not making the change. I do agree that for libraries that support C++03 compilers, a scoped enum is awkward.

...

More doc comments....

...

index.html

9 editorial fixes.

Applied, more or less as suggested.

...

buffer.html

s/See the Wikipedia/See Wikipedia/ reads better to me

I tried it, but "the Wikipedia" reads better to me.

...

12 editorial fixes.

Applied, more or less as suggested. I've push the changes up to github, and also updated https://boostorg.github.io/endian/ Thanks, --Beman

Gavin Lambert

19 Feb 19 Feb

12:43 a.m.

On 19/02/2015 01:36, Beman Dawes wrote:

...

On Sun, Feb 1, 2015 at 7:11 PM, Rob Stewart <rob.stewart@verizon.net> wrote:

...
buffer.html

s/See the Wikipedia/See Wikipedia/ reads better to me

I tried it, but "the Wikipedia" reads better to me.

For what this particular bikeshed is worth, as "Wikipedia" is a proper noun, usually the forms that sound the most correct are "see the Wikipedia article for more info" or "see Wikipedia for more info". You're not supposed to use articles with proper nouns such as place names, and that includes well-known site names. (Although Vortigaunts get a free pass. They remember the Freeman.) Given that in this case you're hyperlinking to a specific article, perhaps you should just add "article" to the hyperlink, as in "the _Wikipedia article_".

3790

Age (days ago)

3817

Last active (days ago)

List overview

Download

57 comments

16 participants

participants (16)

Beman Dawes
Bjorn Reese
Cliff Green
Gavin Lambert
Gottlob Frege
Jason Newton
Jeremy Maitin-Shepard
Joel FALCOU
Nevin Liber
Niall Douglas
Olaf van der Spek
Paul A. Bristow
Peter Dimov
Philip Bennefall
Rob Stewart
Stefan Seefeld