[config] RFC PR 82
Hi everyone, I am finally getting back to (trying to) contributing some of my personal projects to Boost. For two of those, Err (https://github.com/psiha/err) and especially Functonoid (a C++11 generalization and rewrite of my previous Boost.Function related work) I need some lower level codegen and/or optimiser control functionality (i.e. portable macros wrapping toolset specific attributes and pragmas) that I've added to my personal fork of Boost.Config and which I've now submitted in the subject PR (https://github.com/boostorg/config/pull/82). I don't expect this PR to be accepted as is/'just like that' so I'm opening this thread where we can discuss which of those changes/macros are welcome, which need more work and which, for some reason, should not be part of Boost.Config (and which, in turn then, I have to move to some 'internal implementation headers' in libraries that will need them). To avoid 'spamming' (and save time;) I'll skip the explanation of the individual macros as I expect them to mostly be self-explanatory (if not from their name then from the minimal Boost.Config documentation additions that are part of th PR). ps. I'll be on the (off) road for the next three weeks so I don't know when I'll be able to respond until I get back... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman
On 2015-11-17 02:17, Domagoj Saric wrote:
Hi everyone, I am finally getting back to (trying to) contributing some of my personal projects to Boost. For two of those, Err (https://github.com/psiha/err) and especially Functonoid (a C++11 generalization and rewrite of my previous Boost.Function related work) I need some lower level codegen and/or optimiser control functionality (i.e. portable macros wrapping toolset specific attributes and pragmas) that I've added to my personal fork of Boost.Config and which I've now submitted in the subject PR (https://github.com/boostorg/config/pull/82).
I don't expect this PR to be accepted as is/'just like that' so I'm opening this thread where we can discuss which of those changes/macros are welcome, which need more work and which, for some reason, should not be part of Boost.Config (and which, in turn then, I have to move to some 'internal implementation headers' in libraries that will need them).
Personally, I'm in favor of adding these: BOOST_OVERRIDE, BOOST_FINAL. Although their implementation should be similar to other C++11 macros - they should be based on BOOST_NO_CXX11_FINAL and BOOST_NO_CXX11_OVERRIDE. I would like to have BOOST_ASSUME (implemented without an assert, i.e. equivalent to your BOOST_ASSUME_UNCHECKED), BOOST_UNREACHABLE (again, without an assert, i.e. equivalent to your BOOST_UNREACHABLE_UNCHECKED). The reason for no asserts is that (a) Boost.Config should not depend on Boost.Assert and (b) I fear that having additional expressions before the intrinsic could inhibit the optimization. You can always add *_CHECKED versions of the macros locally, or just use asserts beside the macros. BOOST_ASSUME_ALIGNED might also be useful (hints the compiler that a pointer has at least the specified alignment). I would have liked BOOST_HAS_CXX_RESTRICT to indicate that the compiler has support for the C99 keyword 'restrict' (or an equivalent) in C++ (the CXX in the macro name emphasizes that the feature is available in C++, not C). The BOOST_RESTRICT macro would be defined to that keyword or empty if there is no support. I don't see much point in the additional _PTR, _REF and _THIS macros. I'm somewhat in favor of adding BOOST_NOVTABLE, although I doubt it will have much use in Boost libraries. BOOST_MAY_ALIAS is probably a good addition. I have seen it reimplemented in multiple Boost libraries now. The reason I'm in favor of all the above macros is that I had to implement and use most of them in Boost or other projects (some of them you will find in Boost.Log). I would have found a portable alternative useful. BOOST_THREAD_LOCAL_POD is kind of controversial. I do use compiler-based TLS in my projects, including Boost.Log, so it would be a useful macro. But it's not an optimization - when you use it, the compiler support is required. There has to be a way to test if the support exists. I'm not sure Boost.Config is the right place for this. (BTW, you could use thread_local from C++11, when none of the lighter weight keywords are not available.) I don't see much use in BOOST_ATTRIBUTES and related macros - you can achieve the same results with feature specific-macros (e.g. by using BOOST_NORETURN instead of BOOST_ATTRIBUTES(BOOST_DOES_NOT_RETURN)). I don't see the benefit of BOOST_NOTHROW_LITE. Ditto BOOST_HAS_UNION_TYPE_PUNNING_TRICK (doesn't any compiler support this?). I don't think BOOST_OVERRIDABLE_SYMBOL is a good idea, given that the same effect can be achieved in pure C++. Also, some compilers offer this functionality only as a pragma. Also, the naming is confusing. Calling conventions macros are probably too specialized to functional libraries, I don't think there's much use for these. I would rather not have them in Boost.Config to avoid spreading their use to other Boost libraries. Function optimization macros are probably too compiler and case-specific. Your choice of what is considered fast, small code, acceptable math optimizations may not fit others. Also, things like these should have a very limited use, as the user has to have the ultimate control over the build options. If I missed anything then I probably didn't see it useful or didn't understand what it does.
ps. I'll be on the (off) road for the next three weeks so I don't know when I'll be able to respond until I get back...
Then you probably chose an inconvenient moment to start this discussion.
For informations: https://github.com/boostorg/config/pull/81 is the RESTRICT macro I extracted from Boost.SIMD. On 17/11/2015 01:54, Andrey Semashev wrote:
On 2015-11-17 02:17, Domagoj Saric wrote:
Hi everyone, I am finally getting back to (trying to) contributing some of my personal projects to Boost. For two of those, Err (https://github.com/psiha/err) and especially Functonoid (a C++11 generalization and rewrite of my previous Boost.Function related work) I need some lower level codegen and/or optimiser control functionality (i.e. portable macros wrapping toolset specific attributes and pragmas) that I've added to my personal fork of Boost.Config and which I've now submitted in the subject PR (https://github.com/boostorg/config/pull/82).
I don't expect this PR to be accepted as is/'just like that' so I'm opening this thread where we can discuss which of those changes/macros are welcome, which need more work and which, for some reason, should not be part of Boost.Config (and which, in turn then, I have to move to some 'internal implementation headers' in libraries that will need them).
Personally, I'm in favor of adding these: BOOST_OVERRIDE, BOOST_FINAL. Although their implementation should be similar to other C++11 macros - they should be based on BOOST_NO_CXX11_FINAL and BOOST_NO_CXX11_OVERRIDE.
I would like to have BOOST_ASSUME (implemented without an assert, i.e. equivalent to your BOOST_ASSUME_UNCHECKED), BOOST_UNREACHABLE (again, without an assert, i.e. equivalent to your BOOST_UNREACHABLE_UNCHECKED). The reason for no asserts is that (a) Boost.Config should not depend on Boost.Assert and (b) I fear that having additional expressions before the intrinsic could inhibit the optimization. You can always add *_CHECKED versions of the macros locally, or just use asserts beside the macros. BOOST_ASSUME_ALIGNED might also be useful (hints the compiler that a pointer has at least the specified alignment).
I would have liked BOOST_HAS_CXX_RESTRICT to indicate that the compiler has support for the C99 keyword 'restrict' (or an equivalent) in C++ (the CXX in the macro name emphasizes that the feature is available in C++, not C). The BOOST_RESTRICT macro would be defined to that keyword or empty if there is no support. I don't see much point in the additional _PTR, _REF and _THIS macros.
I'm somewhat in favor of adding BOOST_NOVTABLE, although I doubt it will have much use in Boost libraries.
BOOST_MAY_ALIAS is probably a good addition. I have seen it reimplemented in multiple Boost libraries now.
The reason I'm in favor of all the above macros is that I had to implement and use most of them in Boost or other projects (some of them you will find in Boost.Log). I would have found a portable alternative useful.
BOOST_THREAD_LOCAL_POD is kind of controversial. I do use compiler-based TLS in my projects, including Boost.Log, so it would be a useful macro. But it's not an optimization - when you use it, the compiler support is required. There has to be a way to test if the support exists. I'm not sure Boost.Config is the right place for this. (BTW, you could use thread_local from C++11, when none of the lighter weight keywords are not available.)
I don't see much use in BOOST_ATTRIBUTES and related macros - you can achieve the same results with feature specific-macros (e.g. by using BOOST_NORETURN instead of BOOST_ATTRIBUTES(BOOST_DOES_NOT_RETURN)).
I don't see the benefit of BOOST_NOTHROW_LITE. Ditto BOOST_HAS_UNION_TYPE_PUNNING_TRICK (doesn't any compiler support this?).
I don't think BOOST_OVERRIDABLE_SYMBOL is a good idea, given that the same effect can be achieved in pure C++. Also, some compilers offer this functionality only as a pragma. Also, the naming is confusing.
Calling conventions macros are probably too specialized to functional libraries, I don't think there's much use for these. I would rather not have them in Boost.Config to avoid spreading their use to other Boost libraries.
Function optimization macros are probably too compiler and case-specific. Your choice of what is considered fast, small code, acceptable math optimizations may not fit others. Also, things like these should have a very limited use, as the user has to have the ultimate control over the build options.
If I missed anything then I probably didn't see it useful or didn't understand what it does.
ps. I'll be on the (off) road for the next three weeks so I don't know when I'll be able to respond until I get back...
Then you probably chose an inconvenient moment to start this discussion.
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On Tue, 17 Nov 2015 06:24:37 +0530, Andrey Semashev
Personally, I'm in favor of adding these: BOOST_OVERRIDE, BOOST_FINAL. Although their implementation should be similar to other C++11 macros - they should be based on BOOST_NO_CXX11_FINAL and BOOST_NO_CXX11_OVERRIDE.
I agree, but what if you don't have final but do have sealed (with a less recent MSVC)?
I would like to have BOOST_ASSUME (implemented without an assert, i.e. equivalent to your BOOST_ASSUME_UNCHECKED), BOOST_UNREACHABLE (again, without an assert, i.e. equivalent to your BOOST_UNREACHABLE_UNCHECKED). The reason for no asserts is that (a) Boost.Config should not depend on Boost.Assert and (b) I fear that having additional expressions before the intrinsic could inhibit the optimization. You can always add *_CHECKED versions of the macros locally, or just use asserts beside the macros.
The additional expressions are assert macros which resolve to nothing in release builds (and thus have no effect on optimisations...checked;) Dependency on Boost.Assert is technically only there if you use the 'checked' macros...I agree that it is still 'ugly' (and the user would have to separately/explicitly include boost/assert.hpp to avoid a circular dependency) but so is, to me, the idea of having to manually duplicate/prefix all assumes with asserts (since I like all my assumes verified and this would add so much extra verbosity)...
I would have liked BOOST_HAS_CXX_RESTRICT to indicate that the compiler has support for the C99 keyword 'restrict' (or an equivalent) in C++ (the CXX in the macro name emphasizes that the feature is available in C++, not C). The BOOST_RESTRICT macro would be defined to that keyword or empty if there is no support.
Sure I can add the detection macro but for which 'feature set' (already for minimal - only pointers, or only for full - pointers, refs and this)?
I don't see much point in the additional _PTR, _REF and _THIS macros.
These are unfortunately required because of sloppiness on the part of MSVC devs: initially they added __restrict only for pointers, then after nagging in 2015 they finally added it for references but seems more nagging is required to get restricted this :/
I'm somewhat in favor of adding BOOST_NOVTABLE, although I doubt it will have much use in Boost libraries.
It's about paving a way for a standard(ised) class attribute ;)
BOOST_THREAD_LOCAL_POD is kind of controversial. I do use compiler-based TLS in my projects, including Boost.Log, so it would be a useful macro. out But it's not an optimization - when you use it, the compiler support is required. There has to be a way to test if the support exists. I'm not sure Boost.Config is the right place for this. (BTW, you could use thread_local from C++11, when none of the lighter weight keywords are not available.)
That one is not about optimisation rather about lack of C++11 thread_local, for a foreseeable future, on OSX as a good enough solution for PODs. As for the test macro, you are right it should be added...
I don't see much use in BOOST_ATTRIBUTES and related macros - you can achieve the same results with feature specific-macros (e.g. by using BOOST_NORETURN instead of BOOST_ATTRIBUTES(BOOST_DOES_NOT_RETURN)).
Yes, I may change those...I was however 'forward thinking' WRT attributes standardization (so that the BOOST_ATTRIBUTES(BOOST_DOES_NOT_RETURN) macros look like 'one day' [[noreturn]]) and 'backward thinking' i.e. compatibility - since some compilers want attributes at the end and some at the front - in which case BOOST_ATTRIBUTES would need to be further changed expanded into a function declaration macro (BOOST_F_DECL( return_type, calling_convention, parameters, attributes ))...
I don't see the benefit of BOOST_NOTHROW_LITE.
It's a nothrow attribute that does not insert runtime checks to call std::terminate...and it is unfortunately not offered by Boost.Config...
Ditto BOOST_HAS_UNION_TYPE_PUNNING_TRICK (doesn't any compiler support this?).
'I'm all with you on this one' but since 'it is not in the standard' language purists will probably complain if it is used unconditionally... (I need this and the *ALIAS* macros for a rewrite/expansion of Boost.Cast, that includes 'bitwise_cast', a sort of generic, safe&optimal reinterpret_cast)...
I don't think BOOST_OVERRIDABLE_SYMBOL is a good idea, given that the same effect can be achieved in pure C++.
You mean creating a class template with a single dummy template argument and a static data member just so that you can define a global variable in a header w/o linker errors?
Also, some compilers offer this functionality only as a pragma.
You mean in a way that would require a _BEGIN and _END macro pair?
Also, the naming is confusing.
I expected it to 'create some friction' - basically I want(ed) a portable __declspec( selectany )... (although selectany also has 'discardable symbol' semantics - since the MSVC linker will not by default discard unused global data)...
Calling conventions macros are probably too specialized to functional libraries, I don't think there's much use for these. I would rather not have them in Boost.Config to avoid spreading their use to other Boost libraries.
That's kind of self-contradicting, if there is a 'danger' of them being used in other libraries that would imply there is a 'danger' from them being useful... In any case, I agree that most of those would mostly be used only in functional libraries but for HPC and math libraries especially, the *REG*/'fastcall' conventions are useful when they cannot (e.g. on ABI boundaries) or do not want to rely on the compiler (IPO, LTCG etc.) to automatically choose the optimal/custom calling convention...Admittedly this is mostly useful on targets with 'bad default' conventions, like 32bit x86 and MS x64, but these are still widely used ABIs :/
Function optimization macros are probably too compiler and case-specific. Your choice of what is considered fast, small code, acceptable math optimizations may not fit others.
If the indisputable goal (definition of 'good codegen') is to have fast and small code/binaries then 'I have to disagree'. For example a library dev can certainly know that certain code will never be part of a hot block (assuming correct usage of the library), for example initialisation, cleanup or error/failure related code and should thus be optimised for size (because that is actually optimising for real world speed - reducing unnecessary bloat - IO and CPU cache thrashing). Further more, you can even know that some code will/should always be in a cold path (and should be decorated with the cold rather than just minsize attribute), e.g. noreturn functions...etc... Ditto for 'fastmath&co': e.g. for FFT you know that float arithmetic operations can safely be reordered (and also need every last bit of speed) while for Kahan sum one must not allow associative floating point arithmetic - if you explicitly set those for the affected code you also leave the user the freedom to choose the 'fastmath' setting for the rest of the code... A HPC library dev is actually supposed to know which functions/loops are hot/crucial and should thus mark those as hot - this actually gives _more_ options to the user (if the lib is header only) as one can freely choose a global optimise-for-size switch (e.g. if one is mostly writing a GUI wrapper around the HPC lib) w/o hurting the performance/optimisation of the hot paths... @too compiler specific - even if you find such a case - 'ain't that what boost is for - standardising things?'
Also, things like these should have a very limited use, as the user has to have the ultimate control over the build options.
I'm 'more than all for ultimate control' - as explained above this can actually give more control to the user (+ Boost Build was already a major pain when it came to user control over changing compiler optimisation options in preexisting variants)...
ps. I'll be on the (off) road for the next three weeks so I don't know when I'll be able to respond until I get back...
Then you probably chose an inconvenient moment to start this discussion.
Yes of course, and no - as after so many 'delays' I finally 'had' to do it this way... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman --- Ova e-pošta je provjerena na viruse Avast protuvirusnim programom. https://www.avast.com/antivirus
On 2015-11-24 19:29, Domagoj Šarić wrote:
On Tue, 17 Nov 2015 06:24:37 +0530, Andrey Semashev
wrote: Personally, I'm in favor of adding these: BOOST_OVERRIDE, BOOST_FINAL. Although their implementation should be similar to other C++11 macros - they should be based on BOOST_NO_CXX11_FINAL and BOOST_NO_CXX11_OVERRIDE.
I agree, but what if you don't have final but do have sealed (with a less recent MSVC)?
As far as I understand, sealed can be used only with C++/CLR, is that right? If so then I'd rather not add a macro for it. If on the other hand sealed can be used equivalently to final in all contexts, then you could use it to implement BOOST_FINAL.
I would like to have BOOST_ASSUME (implemented without an assert, i.e. equivalent to your BOOST_ASSUME_UNCHECKED), BOOST_UNREACHABLE (again, without an assert, i.e. equivalent to your BOOST_UNREACHABLE_UNCHECKED). The reason for no asserts is that (a) Boost.Config should not depend on Boost.Assert and (b) I fear that having additional expressions before the intrinsic could inhibit the optimization. You can always add *_CHECKED versions of the macros locally, or just use asserts beside the macros.
The additional expressions are assert macros which resolve to nothing in release builds (and thus have no effect on optimisations...checked;)
In release builds asserts are expanded to something like (void)0. Technically, that's nothing, but who knows if it affects optimization.
Dependency on Boost.Assert is technically only there if you use the 'checked' macros...I agree that it is still 'ugly' (and the user would have to separately/explicitly include boost/assert.hpp to avoid a circular dependency) but so is, to me, the idea of having to manually duplicate/prefix all assumes with asserts (since I like all my assumes verified and this would add so much extra verbosity)...
You can add the checked versions to Boost.Assert with a separate PR.
I would have liked BOOST_HAS_CXX_RESTRICT to indicate that the compiler has support for the C99 keyword 'restrict' (or an equivalent) in C++ (the CXX in the macro name emphasizes that the feature is available in C++, not C). The BOOST_RESTRICT macro would be defined to that keyword or empty if there is no support.
Sure I can add the detection macro but for which 'feature set' (already for minimal - only pointers, or only for full - pointers, refs and this)?
That's a good question. I'm leaning towards full support, although that will probably not make MSVC users happy. There is a precedent of BOOST_DEFAULTED_FUNCTION - it expands to C++03 code on gcc 4.5 even though it supports defaulted functions in C++11 mode, but only in public sections.
I don't see much use in BOOST_ATTRIBUTES and related macros - you can achieve the same results with feature specific-macros (e.g. by using BOOST_NORETURN instead of BOOST_ATTRIBUTES(BOOST_DOES_NOT_RETURN)).
Yes, I may change those...I was however 'forward thinking' WRT attributes standardization (so that the BOOST_ATTRIBUTES(BOOST_DOES_NOT_RETURN) macros look like 'one day' [[noreturn]])
That still doesn't improve over BOOST_NORETURN. If there's a reason to, we could even define BOOST_NORETURN to [[noreturn]].
I don't see the benefit of BOOST_NOTHROW_LITE.
It's a nothrow attribute that does not insert runtime checks to call std::terminate...and it is unfortunately not offered by Boost.Config...
Do you have measurments of the possible benefits compared to noexcept? I mean, noexcept was advertised as the more efficient version of throw() already.
Ditto BOOST_HAS_UNION_TYPE_PUNNING_TRICK (doesn't any compiler support this?).
'I'm all with you on this one' but since 'it is not in the standard' language purists will probably complain if it is used unconditionally...
To some extent this is guaranteed by [class.union]/1 in C++11.
(I need this and the *ALIAS* macros for a rewrite/expansion of Boost.Cast, that includes 'bitwise_cast', a sort of generic, safe&optimal reinterpret_cast)...
Again, it looks like this macro would have a rather specialized use.
I don't think BOOST_OVERRIDABLE_SYMBOL is a good idea, given that the same effect can be achieved in pure C++.
You mean creating a class template with a single dummy template argument and a static data member just so that you can define a global variable in a header w/o linker errors?
Slightly better: template< typename T, typename Tag = void > struct singleton { static T instance; }; template< typename T, typename Tag > T singleton< T, Tag >::instance;
Also, some compilers offer this functionality only as a pragma.
You mean in a way that would require a _BEGIN and _END macro pair?
Maybe for some compilers. I meant this: https://docs.oracle.com/cd/E19205-01/819-5267/bkbkr/index.html There's just no point in these compiler-specific workarounds when there's a portable solution.
Calling conventions macros are probably too specialized to functional libraries, I don't think there's much use for these. I would rather not have them in Boost.Config to avoid spreading their use to other Boost libraries.
That's kind of self-contradicting, if there is a 'danger' of them being used in other libraries that would imply there is a 'danger' from them being useful...
What I mean is that having these macros in Boost.Config might encourage people to use them where they would normally not.
In any case, I agree that most of those would mostly be used only in functional libraries but for HPC and math libraries especially, the *REG*/'fastcall' conventions are useful when they cannot (e.g. on ABI boundaries) or do not want to rely on the compiler (IPO, LTCG etc.) to automatically choose the optimal/custom calling convention...Admittedly this is mostly useful on targets with 'bad default' conventions, like 32bit x86 and MS x64, but these are still widely used ABIs :/
Non-standard calling conventions give enough headache for users to avoid them as much as possible. You might use them in library internals but there I think it's better to avoid the call at all - by forcing the hot code inline.
Function optimization macros are probably too compiler and case-specific. Your choice of what is considered fast, small code, acceptable math optimizations may not fit others.
If the indisputable goal (definition of 'good codegen') is to have fast and small code/binaries then 'I have to disagree'. For example a library dev can certainly know that certain code will never be part of a hot block (assuming correct usage of the library), for example initialisation, cleanup or error/failure related code and should thus be optimised for size (because that is actually optimising for real world speed - reducing unnecessary bloat - IO and CPU cache thrashing).
If that code is unimportant then why do you care? Simply organizing code into functions properly and using BOOST_LIKELY/UNLIKELY where needed will do the thing.
Also, things like these should have a very limited use, as the user has to have the ultimate control over the build options.
I'm 'more than all for ultimate control' - as explained above this can actually give more control to the user (+ Boost Build was already a major pain when it came to user control over changing compiler optimisation options in preexisting variants)...
What I was saying is that it's the user who has to decide whether to build your code for size or for speed or for debug. That includes the parts of the code that you, the library author, consider performance critical or otherwise. You may want to restrict his range of choices, e.g. when a certain optimization breaks your code. I guess, you could try to spell these restrictions with these macros, but frankly I doubt it's worth the effort. I mean, there are so many possibilities on different compilers. One legitimate reason to use these macros that comes to mind is changing the target instruction set for a set of functions that require that (e.g. when a function is optimized for AVX in an application that is supposed to run in the absence of this extension). But then this only seems necessary with gcc, which again makes it a rather specific workaround.
On 11/24/2015 5:20 PM, Andrey Semashev wrote:
Ditto BOOST_HAS_UNION_TYPE_PUNNING_TRICK (doesn't any compiler support this?).
'I'm all with you on this one' but since 'it is not in the standard' language purists will probably complain if it is used unconditionally...
To some extent this is guaranteed by [class.union]/1 in C++11.
No, it isn't. Regards, -- Agustín K-ballo Bergé.- http://talesofcpp.fusionfenix.com
On 2015-11-24 23:54, Agustín K-ballo Bergé wrote:
On 11/24/2015 5:20 PM, Andrey Semashev wrote:
Ditto BOOST_HAS_UNION_TYPE_PUNNING_TRICK (doesn't any compiler support this?).
'I'm all with you on this one' but since 'it is not in the standard' language purists will probably complain if it is used unconditionally...
To some extent this is guaranteed by [class.union]/1 in C++11.
No, it isn't.
Why? Reading different members of the standard layout union within the common initial sequence is enough to implement a bitwise_cast.
On 11/24/2015 6:03 PM, Andrey Semashev wrote:
On 2015-11-24 23:54, Agustín K-ballo Bergé wrote:
On 11/24/2015 5:20 PM, Andrey Semashev wrote:
Ditto BOOST_HAS_UNION_TYPE_PUNNING_TRICK (doesn't any compiler support this?).
'I'm all with you on this one' but since 'it is not in the standard' language purists will probably complain if it is used unconditionally...
To some extent this is guaranteed by [class.union]/1 in C++11.
No, it isn't.
Why? Reading different members of the standard layout union within the common initial sequence is enough to implement a bitwise_cast.
That would be 9.2 [class.mem]/19, notes are not normative. I can't tell from the context whether this would be enough for a `bitwise_cast` (what is it supposed to do?). The OP talks about "union type punning trick", which is a fine practice in C11 but undefined behavior in C++. Some compilers choose to offer this as a conforming extension. Regards, -- Agustín K-ballo Bergé.- http://talesofcpp.fusionfenix.com
On 2015-11-25 00:21, Agustín K-ballo Bergé wrote:
On 11/24/2015 6:03 PM, Andrey Semashev wrote:
On 2015-11-24 23:54, Agustín K-ballo Bergé wrote:
On 11/24/2015 5:20 PM, Andrey Semashev wrote:
Ditto BOOST_HAS_UNION_TYPE_PUNNING_TRICK (doesn't any compiler support this?).
'I'm all with you on this one' but since 'it is not in the standard' language purists will probably complain if it is used unconditionally...
To some extent this is guaranteed by [class.union]/1 in C++11.
No, it isn't.
Why? Reading different members of the standard layout union within the common initial sequence is enough to implement a bitwise_cast.
That would be 9.2 [class.mem]/19, notes are not normative.
I can't tell from the context whether this would be enough for a `bitwise_cast` (what is it supposed to do?). The OP talks about "union type punning trick", which is a fine practice in C11 but undefined behavior in C++. Some compilers choose to offer this as a conforming extension.
I'm not sure I understand. This definitely was UB in C++03, but that addition you pointed to makes it defined behavior in C++11. By bitwise_cast I meant something along these lines: template< typename To, typename From > To bitwise_cast(From from) { union { From as_from; To as_to; } caster = { from }; return caster.as_to; } With certain restrictions on From and To types I think this code has a well defined behavior in C++11.
On 11/24/2015 6:44 PM, Andrey Semashev wrote:
On 2015-11-25 00:21, Agustín K-ballo Bergé wrote:
On 11/24/2015 6:03 PM, Andrey Semashev wrote:
On 2015-11-24 23:54, Agustín K-ballo Bergé wrote:
On 11/24/2015 5:20 PM, Andrey Semashev wrote:
> Ditto BOOST_HAS_UNION_TYPE_PUNNING_TRICK (doesn't any compiler > support > this?).
'I'm all with you on this one' but since 'it is not in the standard' language purists will probably complain if it is used unconditionally...
To some extent this is guaranteed by [class.union]/1 in C++11.
No, it isn't.
Why? Reading different members of the standard layout union within the common initial sequence is enough to implement a bitwise_cast.
That would be 9.2 [class.mem]/19, notes are not normative.
I can't tell from the context whether this would be enough for a `bitwise_cast` (what is it supposed to do?). The OP talks about "union type punning trick", which is a fine practice in C11 but undefined behavior in C++. Some compilers choose to offer this as a conforming extension.
I'm not sure I understand. This definitely was UB in C++03, but that addition you pointed to makes it defined behavior in C++11.
That bit I pointed to has always been there as far as I know, and I can confirm it is there in C++03 (9.2 [class.mem]/16).
By bitwise_cast I meant something along these lines:
template< typename To, typename From > To bitwise_cast(From from) { union { From as_from; To as_to; } caster = { from }; return caster.as_to; }
With certain restrictions on From and To types I think this code has a well defined behavior in C++11.
Those restrictions would be that `To` and `From` are layout-compatible, and that both `To` and `From` are standard-layout struct types. Regards, -- Agustín K-ballo Bergé.- http://talesofcpp.fusionfenix.com
On 25/11/2015 10:03, Andrey Semashev wrote:
On 2015-11-24 23:54, Agustín K-ballo Bergé wrote:
On 11/24/2015 5:20 PM, Andrey Semashev wrote:
Ditto BOOST_HAS_UNION_TYPE_PUNNING_TRICK (doesn't any compiler support this?).
'I'm all with you on this one' but since 'it is not in the standard' language purists will probably complain if it is used unconditionally...
To some extent this is guaranteed by [class.union]/1 in C++11.
No, it isn't.
Why? Reading different members of the standard layout union within the common initial sequence is enough to implement a bitwise_cast.
I don't have a standard reference handy, but I'm pretty sure that reading from a different member of a union than was written to is still explicitly UB -- although for practical reasons *most* compilers will generate the expected result provided that the two types have the same initial alignment. The only officially supported way to type-pun AFAIK is to cast via byte sequences (aka uint8_t * and friends), and even that can get you in trouble (on some platforms) if the alignments don't match.
Am 25. November 2015 01:02:25 MEZ, schrieb Gavin Lambert
On 2015-11-24 23:54, Agustín K-ballo Bergé wrote:
On 11/24/2015 5:20 PM, Andrey Semashev wrote:
Ditto BOOST_HAS_UNION_TYPE_PUNNING_TRICK (doesn't any compiler support this?).
'I'm all with you on this one' but since 'it is not in the standard' language purists will probably complain if it is used unconditionally...
To some extent this is guaranteed by [class.union]/1 in C++11.
No, it isn't.
Why? Reading different members of the standard layout union within
On 25/11/2015 10:03, Andrey Semashev wrote: the
common initial sequence is enough to implement a bitwise_cast.
I don't have a standard reference handy, but I'm pretty sure that reading from a different member of a union than was written to is still
explicitly UB -- although for practical reasons *most* compilers will generate the expected result provided that the two types have the same initial alignment. http://en.cppreference.com/w/cpp/language/union have an example that explicitly states what Gavin claimed. Not sure about the reliability of that site, though.
Best, Alexander
On 2015-11-25 18:32, Alexander Lauser wrote:
Am 25. November 2015 01:02:25 MEZ, schrieb Gavin Lambert
: On 2015-11-24 23:54, Agustín K-ballo Bergé wrote:
On 11/24/2015 5:20 PM, Andrey Semashev wrote:
> Ditto BOOST_HAS_UNION_TYPE_PUNNING_TRICK (doesn't any > compiler support > this?).
'I'm all with you on this one' but since 'it is not in the standard' language purists will probably complain if it is used unconditionally...
To some extent this is guaranteed by [class.union]/1 in C++11.
No, it isn't.
Why? Reading different members of the standard layout union within
On 25/11/2015 10:03, Andrey Semashev wrote: the
common initial sequence is enough to implement a bitwise_cast.
I don't have a standard reference handy, but I'm pretty sure that reading from a different member of a union than was written to is still
explicitly UB -- although for practical reasons *most* compilers will generate the expected result provided that the two types have the same initial alignment.
http://en.cppreference.com/w/cpp/language/union have an example that explicitly states what Gavin claimed. Not sure about the reliability of that site, though.
That sounds like self contradiction to me. The page says it's well defined to examine the common subsequence of standard-layout union members but at the same time it's UB to read from them. What's the difference?
On 11/25/2015 12:46 PM, Andrey Semashev wrote:
On 2015-11-25 18:32, Alexander Lauser wrote:
Am 25. November 2015 01:02:25 MEZ, schrieb Gavin Lambert
: On 2015-11-24 23:54, Agustín K-ballo Bergé wrote:
On 11/24/2015 5:20 PM, Andrey Semashev wrote:
>> Ditto BOOST_HAS_UNION_TYPE_PUNNING_TRICK (doesn't any >> compiler support >> this?). > > 'I'm all with you on this one' but since 'it is not in the standard' > language purists will probably complain if it is used > unconditionally...
To some extent this is guaranteed by [class.union]/1 in C++11.
No, it isn't.
Why? Reading different members of the standard layout union within
On 25/11/2015 10:03, Andrey Semashev wrote: the
common initial sequence is enough to implement a bitwise_cast.
I don't have a standard reference handy, but I'm pretty sure that reading from a different member of a union than was written to is still
explicitly UB -- although for practical reasons *most* compilers will generate the expected result provided that the two types have the same initial alignment.
http://en.cppreference.com/w/cpp/language/union have an example that explicitly states what Gavin claimed. Not sure about the reliability of that site, though.
That sounds like self contradiction to me. The page says it's well defined to examine the common subsequence of standard-layout union members but at the same time it's UB to read from them. What's the difference?
The wording is convoluted, maybe it becomes clearer with some examples. Consider: int bitcast(float x) { union { float from; int to; }; from = x; return to; } The write to `from` above is a dead store, nowhere else in the function is there a read from it or from anything "compatible" with it. A compiler is allowed to rewrite this function as follows: int bitcast(float x) { int to; return to; } Furthermore, the value of `to` is unspecified, so the whole function might simply turn into returning a trap-representation. Some compilers provide the C11 semantics instead, as a conforming extension (C11 semantics are a valid form of undefined behavior). On those compilers, the function would essentially have the following effects: int bitcast(float x) { int to; std::memcpy(&to, &x, sizeof(int)); return to; } The common initial sequence standard escape hatch does not apply there, it applies only in very limited scenarios (even less than one would think it does, due to wording), like the following: union { struct { int which; int value; } first; struct { int which; float value; } second; } u; switch (u.first.which) { // OK if either first or second is active case 0: print(u.first.value); break; // UB if first not active case 1: print(u.second.value); break; // UB if second not active } Regards, -- Agustín K-ballo Bergé.- http://talesofcpp.fusionfenix.com
On Wed, Nov 25, 2015 at 7:25 PM, Agustín K-ballo Bergé
On 11/25/2015 12:46 PM, Andrey Semashev wrote:
On 2015-11-25 18:32, Alexander Lauser wrote:
http://en.cppreference.com/w/cpp/language/union have an example that explicitly states what Gavin claimed. Not sure about the reliability of that site, though.
That sounds like self contradiction to me. The page says it's well defined to examine the common subsequence of standard-layout union members but at the same time it's UB to read from them. What's the difference?
The wording is convoluted, maybe it becomes clearer with some examples.
I see, thank you for the clarification. IMHO, the standard should just follow C11 semantics and say it more clearly. Regarding the original topic and the proposed Boost.Config macro for detection of this compiler feature, I'm still not sure we need it, although now I'm not as strongly convinced.
On 11/25/2015 10:52 PM, Andrey Semashev wrote:
On Wed, Nov 25, 2015 at 7:25 PM, Agustín K-ballo Bergé
wrote: On 11/25/2015 12:46 PM, Andrey Semashev wrote:
On 2015-11-25 18:32, Alexander Lauser wrote:
http://en.cppreference.com/w/cpp/language/union have an example that explicitly states what Gavin claimed. Not sure about the reliability of that site, though.
That sounds like self contradiction to me. The page says it's well defined to examine the common subsequence of standard-layout union members but at the same time it's UB to read from them. What's the difference?
The wording is convoluted, maybe it becomes clearer with some examples.
I see, thank you for the clarification. IMHO, the standard should just follow C11 semantics and say it more clearly.
That's unlikely to happen, C has *vastly* weaker aliasing rules than C++, to the point that several C implementations choose to follow the C++ rules instead. You'd be asking for the opposite to happen.
Regarding the original topic and the proposed Boost.Config macro for detection of this compiler feature, I'm still not sure we need it, although now I'm not as strongly convinced.
As for the original topic, I am certain that having the proposed macro would be a mistake. Don't try to fool a language/compiler smarter than you, or you will end up getting what you asked for (UB) sooner or later. If you have types whose values are comprised by just a set of bits, and you'd wish to operate on said value representation, then do so by using the tools for operating on the value representation of such trivial types. Type punning via unions is not that tool. Regards, -- Agustín K-ballo Bergé.- http://talesofcpp.fusionfenix.com
On 2015-11-26 05:13, Agustín K-ballo Bergé wrote:
On 11/25/2015 10:52 PM, Andrey Semashev wrote:
IMHO, the standard should just follow C11 semantics and say it more clearly.
That's unlikely to happen, C has *vastly* weaker aliasing rules than C++, to the point that several C implementations choose to follow the C++ rules instead. You'd be asking for the opposite to happen.
I can see the benefits of strict aliasing rules in other contexts, but unions specifically exist to provide the common storage for objects of different types in a relatively less messy way compared to, e.g. a raw byte buffer. I'm not seeing people using unions and somehow expecting type aliasing to not happen, quite the contrary. I think the language here is being too limiting for no practical reason.
On 11/26/2015 8:51 AM, Andrey Semashev wrote:
On 2015-11-26 05:13, Agustín K-ballo Bergé wrote:
On 11/25/2015 10:52 PM, Andrey Semashev wrote:
IMHO, the standard should just follow C11 semantics and say it more clearly.
That's unlikely to happen, C has *vastly* weaker aliasing rules than C++, to the point that several C implementations choose to follow the C++ rules instead. You'd be asking for the opposite to happen.
I can see the benefits of strict aliasing rules in other contexts, but unions specifically exist to provide the common storage for objects of different types in a relatively less messy way compared to, e.g. a raw byte buffer. I'm not seeing people using unions and somehow expecting type aliasing to not happen, quite the contrary. I think the language here is being too limiting for no practical reason.
How would that work? If one could read a `float` as if it were an `int`, then ints and floats may alias. If they may alias, then whenever you get a pointer to an `int` you'll have to ask could it be this other `float` instead? Until the whole program is compiled and all `union`s are seen, everything may alias with anything else. We'd be back in a land of bits and bytes... Regards, -- Agustín K-ballo Bergé.- http://talesofcpp.fusionfenix.com
On 2015-11-26 15:44, Agustín K-ballo Bergé wrote:
On 11/26/2015 8:51 AM, Andrey Semashev wrote:
On 2015-11-26 05:13, Agustín K-ballo Bergé wrote:
On 11/25/2015 10:52 PM, Andrey Semashev wrote:
IMHO, the standard should just follow C11 semantics and say it more clearly.
That's unlikely to happen, C has *vastly* weaker aliasing rules than C++, to the point that several C implementations choose to follow the C++ rules instead. You'd be asking for the opposite to happen.
I can see the benefits of strict aliasing rules in other contexts, but unions specifically exist to provide the common storage for objects of different types in a relatively less messy way compared to, e.g. a raw byte buffer. I'm not seeing people using unions and somehow expecting type aliasing to not happen, quite the contrary. I think the language here is being too limiting for no practical reason.
How would that work? If one could read a `float` as if it were an `int`, then ints and floats may alias. If they may alias, then whenever you get a pointer to an `int` you'll have to ask could it be this other `float` instead?
It would work the way it works now with most, if not all compilers - writing to any members of a union object is allowed to affect reads from any member of the object. As such writes are not dead if any reads exist and writes and reads must not be reordered or eliminated by compiler (as observed by the program; on the machine instructions level there may not be any stores or loads at all). I'm not proposing to allow type aliasing through pointers. For instance, taking pointers to different members of the union and working with the members through the pointers would still be UB. The change will only affect the contexts where the compiler is able to immediately see that we're working with a union - i.e. where the name of the member resolves to a member of a union. union variant { std::uint32_t as_uint32; float as_float; }; variant v1, v2, v3; // This is defined behavior (DB) v1.as_float = 1.0f; std::cout << v1.as_uint32 << std::endl; // This is also DB using pmu_t = std::uint32_t (variant::*); using pmf_t = float (variant::*); pmu_t pmu = &variant::as_uint32; pmf_t pmf = &variant::as_float; v2.*pmf = 1.0f; std::cout << v2.*pmu << std::endl; // This is still formally UB std::uint32_t* pu = &v3.as_uint32; float* pf = &v3.as_float; *pf = 1.0f; std::cout << *pu << std::endl;
On Thu, 26 Nov 2015 07:43:24 +0530, Agustín K-ballo Bergé
As for the original topic, I am certain that having the proposed macro would be a mistake. Don't try to fool a language/compiler smarter than you, or you will end up getting what you asked for (UB) sooner or later. If you have types whose values are comprised by just a set of bits, and you'd wish to operate on said value representation, then do so by using the tools for operating on the value representation of such trivial types. Type punning via unions is not that tool.
IMNHO that's just generic paranoia and refusing to even look outside the box...we'd still be using C++98 with this kind of thinking (e.g. Boost.Move is 'fooling' the pre-C++11 compiler just like templates weren't originally designed as a tool for TMP or CRTP)... Not to mention that various compilers explicitly prescribe ways to be 'fooled' (union type punning for GCC&compatibles, MSVC does not use the aliasing rules at all - Windows wouldn't boot if it did, etc...) -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman --- Ova e-pošta je provjerena na viruse Avast protuvirusnim programom. https://www.avast.com/antivirus
On 12/1/2015 3:16 AM, Domagoj Šarić wrote:
On Thu, 26 Nov 2015 07:43:24 +0530, Agustín K-ballo Bergé
wrote: As for the original topic, I am certain that having the proposed macro would be a mistake. Don't try to fool a language/compiler smarter than you, or you will end up getting what you asked for (UB) sooner or later. If you have types whose values are comprised by just a set of bits, and you'd wish to operate on said value representation, then do so by using the tools for operating on the value representation of such trivial types. Type punning via unions is not that tool.
IMNHO that's just generic paranoia and refusing to even look outside the box...we'd still be using C++98 with this kind of thinking (e.g. Boost.Move is 'fooling' the pre-C++11 compiler just like templates weren't originally designed as a tool for TMP or CRTP)... Not to mention that various compilers explicitly prescribe ways to be 'fooled' (union type punning for GCC&compatibles, MSVC does not use the aliasing rules at all - Windows wouldn't boot if it did, etc...)
IMHO this is short-sighted. Boost.Move is a temporary solution until we transition into well-defined semantics, and as such it only has to be supported in those now old compilers that do not support real move semantics. Your code, on the other hand, would have to be verified and supported in every new compiler version that pops up, until the end of time. Now you might think that your code won't be around long enough for it to see compilers improving in this area, given the amount of legacy code around relying on it, but it doesn't have to. Analyzers and sanitizers are growing stronger by the day, and they are designed to catch exactly the kind of bad code that you'd wish to leverage. So even if your code "just works" now (and in the future), it will be a disservice to users when their sanitized runs terminate because of by-design undefined behavior within Boost. The solution is, then, to severe the link to broken dependencies. But what tops it all, what makes this decision a plain and simple mistake, is that this "just works" because implementations map this kind of undefined behavior into specific well-defined behavior. So it's just undefined behavior for the sake of "pretty syntax" (a subjective thing at best). Why then wouldn't you just simply write that well-defined code that these implementations are translating to? All these "opinions" are based of course on the abstract notion of a `bitcast` function that intends to exploit undefined behavior to do type-punning via unions. If you'd wish to go forward with it, I'd ask that you present us with a concrete implementation so we can start a technical discussion instead. Regards, -- Agustín K-ballo Bergé.- http://talesofcpp.fusionfenix.com
On 2015-12-01 17:18, Agustín K-ballo Bergé wrote:
But what tops it all, what makes this decision a plain and simple mistake, is that this "just works" because implementations map this kind of undefined behavior into specific well-defined behavior. So it's just undefined behavior for the sake of "pretty syntax" (a subjective thing at best). Why then wouldn't you just simply write that well-defined code that these implementations are translating to?
It's not just syntax. If I'm not mistaken, the union-based type punning has advantage over memcpy - it allows the code to be constexpr. memcpy also has potential to be a function call instead of a few instructions (or no instructions at all). I know many compilers are aware of memcpy and optimize it, but that's not something one can rely on.
On 12/1/2015 11:59 AM, Andrey Semashev wrote:
On 2015-12-01 17:18, Agustín K-ballo Bergé wrote:
But what tops it all, what makes this decision a plain and simple mistake, is that this "just works" because implementations map this kind of undefined behavior into specific well-defined behavior. So it's just undefined behavior for the sake of "pretty syntax" (a subjective thing at best). Why then wouldn't you just simply write that well-defined code that these implementations are translating to?
It's not just syntax. If I'm not mistaken, the union-based type punning has advantage over memcpy - it allows the code to be constexpr.
No, it doesn't. There's no undefined behavior allowed in a constant expression. Furthermore, the rules for `union` member access are stronger for constant expressions, there's no common initial sequence rule nor anything like it since constant expressions do not work under the memory object model. Even compilers with implementation-specific behavior would reject it.
memcpy also has potential to be a function call instead of a few instructions (or no instructions at all). I know many compilers are aware of memcpy and optimize it, but that's not something one can rely on.
That could merit a technical discussion, please start one so we can get some concrete data. After all, it would have to be an implementation specific choice, if we don't know whether `memcpy` will be optimized then we can't assume that the undefined behavior will do the `memcpy` you expect. Regards, -- Agustín K-ballo Bergé.- http://talesofcpp.fusionfenix.com
On 2015-12-01 18:32, Agustín K-ballo Bergé wrote:
On 12/1/2015 11:59 AM, Andrey Semashev wrote:
It's not just syntax. If I'm not mistaken, the union-based type punning has advantage over memcpy - it allows the code to be constexpr.
No, it doesn't. There's no undefined behavior allowed in a constant expression. Furthermore, the rules for `union` member access are stronger for constant expressions, there's no common initial sequence rule nor anything like it since constant expressions do not work under the memory object model. Even compilers with implementation-specific behavior would reject it.
I see. I tried this locally with gcc and it succeeded, but apparently I used bitwise_cast in a context that allowed runtime execution, although the compiled code was still using a constant. In a purely constant expression it indeed fails to compile. Good to know.
After all, it would have to be an implementation specific choice, if we don't know whether `memcpy` will be optimized then we can't assume that the undefined behavior will do the `memcpy` you expect.
That's true, although I can't remember any particular compiler implementing a different behavior wrt unions.
On 02/12/2015 13:11, Andrey Semashev wrote:
I see. I tried this locally with gcc and it succeeded, but apparently I used bitwise_cast in a context that allowed runtime execution, although the compiled code was still using a constant. In a purely constant expression it indeed fails to compile. Good to know.
boost::simd::bitwise_cast do uese memcpy. Be aware that sme cmpielrs liek ICC are extremely OCD about aliasign and generate bogus code in those UB. We tried to removed all those tricks in Boost.SIMD in favor of well defined behavior. We didn't encountered much issues in terms of performances. BUT we do have a special code for MSVC to use devious code instead. My 2 cts.
On 1.12.2015. 15:18, Agustín K-ballo Bergé wrote:
IMHO this is short-sighted. Boost.Move is a temporary solution until we transition into well-defined semantics, and as such it only has to be supported in those now old compilers that do not support real move semantics. Your code, on the other hand, would have to be verified and supported in every new compiler version that pops up, until the end of time.
Now you might think that your code won't be around long enough for it to see compilers improving in this area, given the amount of legacy code around relying on it, but it doesn't have to. Analyzers and sanitizers are growing stronger by the day, and they are designed to catch exactly the kind of bad code that you'd wish to leverage. So even if your code "just works" now (and in the future), it will be a disservice to users when their sanitized runs terminate because of by-design undefined behavior within Boost. The solution is, then, to severe the link to broken dependencies.
That's all just circular reasoning (silently assuming that some new form of 'bitwise casting', like the 'union trick', already part of the C standard, will never be part of the C++ standard). The "Boost.Move transition into well-defined semantics" at one point had to be started by a typical ugly-hack-workaround probing into not so well defined semantics (i.e. std::auto_ptr)... That compilers like GCC, that otherwise adhere very closely to the C++ language standard in general and aliasing rules in particular, offer explicit, documented, defined ways to work around some difficulties concerning the aliasing rules (e.g. union type punning, may_alias attribute) speaks volumes... For example 'standard' GCC and Clang headers for SIMD intrisics use the may_alias attribute to implement the vector types and pretty much all SIMD extension vendors define intrisics exactly for bitwise casts between vector types. Obviously if they think that memcpy isn't the hammer for every nail there has to be something to it (if nothing else then debug build performance - I really want debug builds of my math code to be at least fast enough so that I can test them in real time). And if compiler vendors have been explicitly offering these 'unholy' tools for years you don't expect them to disappear silently overnight?
All these "opinions" are based of course on the abstract notion of a `bitcast` function that intends to exploit undefined behavior to do type-punning via unions. If you'd wish to go forward with it, I'd ask that you present us with a concrete implementation so we can start a technical discussion instead.
Actually yes, since I agreed to move this to Boost.Cast, we can cut this short until then... -- C++ >In The Kernel< Now!
On 12/15/2015 8:23 PM, Domagoj Saric wrote:
On 1.12.2015. 15:18, Agustín K-ballo Bergé wrote:
IMHO this is short-sighted. Boost.Move is a temporary solution until we transition into well-defined semantics, and as such it only has to be supported in those now old compilers that do not support real move semantics. Your code, on the other hand, would have to be verified and supported in every new compiler version that pops up, until the end of time.
Now you might think that your code won't be around long enough for it to see compilers improving in this area, given the amount of legacy code around relying on it, but it doesn't have to. Analyzers and sanitizers are growing stronger by the day, and they are designed to catch exactly the kind of bad code that you'd wish to leverage. So even if your code "just works" now (and in the future), it will be a disservice to users when their sanitized runs terminate because of by-design undefined behavior within Boost. The solution is, then, to severe the link to broken dependencies.
That's all just circular reasoning (silently assuming that some new form of 'bitwise casting', like the 'union trick', already part of the C standard, will never be part of the C++ standard).
You make it sound like the committee is unaware of these shiny new C semantics. It isn't. This is deemed a bad idea for C++ along with VLAs, restrict, atomic as a qualifier, you name it...
The "Boost.Move transition into well-defined semantics" at one point had to be started by a typical ugly-hack-workaround probing into not so well defined semantics (i.e. std::auto_ptr)...
You make it sound like Boost.Move invented move semantics. It didn't. Its motivation is to write "portable" C++ code, emulating the "C++0x" move semantics feature in C++03 compilers. I would not oppose to transitionally using the 'union trick' in those implementations where it is known to "just work" until they catch up with `memcpy`, as long as there are technical grounds for it.
That compilers like GCC, that otherwise adhere very closely to the C++ language standard in general and aliasing rules in particular, offer explicit, documented, defined ways to work around some difficulties concerning the aliasing rules (e.g. union type punning, may_alias attribute) speaks volumes... For example 'standard' GCC and Clang headers for SIMD intrisics use the may_alias attribute to implement the vector types and pretty much all SIMD extension vendors define intrisics exactly for bitwise casts between vector types. Obviously if they think that memcpy isn't the hammer for every nail there has to be something to it (if nothing else then debug build performance - I really want debug builds of my math code to be at least fast enough so that I can test them in real time).
These compilers you mention were this is guaranteed also happen to ship `memcpy` as a builtin. Other compilers were this isn't guaranteed also happen to generate "bogus" code (according to your expectations), as stated by members of the community with experience in this particular field. Regards, -- Agustín K-ballo Bergé.- http://talesofcpp.fusionfenix.com
On Wed, 25 Nov 2015 01:50:40 +0530, Andrey Semashev
On 2015-11-24 19:29, Domagoj Šarić wrote:
On Tue, 17 Nov 2015 06:24:37 +0530, Andrey Semashev
wrote: Personally, I'm in favor of adding these: BOOST_OVERRIDE, BOOST_FINAL. Although their implementation should be similar to other C++11 macros - they should be based on BOOST_NO_CXX11_FINAL and BOOST_NO_CXX11_OVERRIDE.
I agree, but what if you don't have final but do have sealed (with a less recent MSVC)?
As far as I understand, sealed can be used only with C++/CLR, is that right? If so then I'd rather not add a macro for it.
If on the other hand sealed can be used equivalently to final in all contexts, then you could use it to implement BOOST_FINAL.
It is available in C++ but technically it is not _the_ C++ keyword but an extension with the same purpose so 'purists' might mind that BOOST_NO_CXX11_FINAL is not defined even when BOOST_FINAL is defined to sealed instead of final...
In release builds asserts are expanded to something like (void)0. Technically, that's nothing, but who knows if it affects optimization.
It doesn't ;) (otherwise plain asserts would also affect optimisation)
Dependency on Boost.Assert is technically only there if you use the 'checked' macros...I agree that it is still 'ugly' (and the user would have to separately/explicitly include boost/assert.hpp to avoid a circular dependency) but so is, to me, the idea of having to manually duplicate/prefix all assumes with asserts (since I like all my assumes verified and this would add so much extra verbosity)...
You can add the checked versions to Boost.Assert with a separate PR.
True, didn't think of that (although that'd still be more verbose than I'd like...BOOST_ASSUME_CHECKED, BOOST_ASSUMED_ASSERT or something like that)... + I'd add a configuration macro to Boost.Assert that would make the regular BOOST_ASSERT use BOOST_ASSUME instead of BOOST_LIKELY...
I would have liked BOOST_HAS_CXX_RESTRICT to indicate that the compiler has support for the C99 keyword 'restrict' (or an equivalent) in C++ (the CXX in the macro name emphasizes that the feature is available in C++, not C). The BOOST_RESTRICT macro would be defined to that keyword or empty if there is no support.
Sure I can add the detection macro but for which 'feature set' (already for minimal - only pointers, or only for full - pointers, refs and this)?
That's a good question. I'm leaning towards full support, although that will probably not make MSVC users happy.
Yeah that'd make me pretty unhappy so no can do :P As it is: BOOST_RESTRICT is defined only if there is full support while _PTR, _REF and _THIS are always defined to whatever the compiler offers...so that's also a possibility (make the 'subfeature' macros always offer what they can and define the HAS macro only for 'all features')...
I don't see much use in BOOST_ATTRIBUTES and related macros - you can achieve the same results with feature specific-macros (e.g. by using BOOST_NORETURN instead of BOOST_ATTRIBUTES(BOOST_DOES_NOT_RETURN)).
Yes, I may change those...I was however 'forward thinking' WRT attributes standardization (so that the BOOST_ATTRIBUTES(BOOST_DOES_NOT_RETURN) macros look like 'one day' [[noreturn]])
That still doesn't improve over BOOST_NORETURN. If there's a reason to, we could even define BOOST_NORETURN to [[noreturn]].
Well if shorter prefixes (than BOOST) were used/allowed for attributes (eg. BFA_ as Boost Function Attribute) then the BOOST_ATTRIBUTES syntax may be less verbose when many attributes are used...but OK I'll have to rethink this...
I don't see the benefit of BOOST_NOTHROW_LITE.
It's a nothrow attribute that does not insert runtime checks to call std::terminate...and it is unfortunately not offered by Boost.Config...
Do you have measurments of the possible benefits compared to noexcept? I mean, noexcept was advertised as the more efficient version of throw() already.
What more measurements beyond the disassembly window which clearly shows unnecessary EH codegen (i.e. bloat) are necessary? (I already talked about this when Beman Dawes was adding the *THROW* macros but was ignored...I'd actually like it more if BOOST_NOTHROW_LITE would simply replace BOOST_NOTHROW instead of adding the new more verbose macro...)
(I need this and the *ALIAS* macros for a rewrite/expansion of Boost.Cast, that includes 'bitwise_cast', a sort of generic, safe&optimal reinterpret_cast)...
Again, it looks like this macro would have a rather specialized use.
You're right - these *ALIAS* related macros may start their life in Boost.Cast and then later be moved to Config if libraries (or client code) start depending on Cast only because of them...
I don't think BOOST_OVERRIDABLE_SYMBOL is a good idea, given that the same effect can be achieved in pure C++.
You mean creating a class template with a single dummy template argument and a static data member just so that you can define a global variable in a header w/o linker errors?
Slightly better:
template< typename T, typename Tag = void > struct singleton { static T instance; }; template< typename T, typename Tag > T singleton< T, Tag >::instance;
That's what I meant...and it is really verbose (and slower to compile than a compiler-specific attribute)...
Also, some compilers offer this functionality only as a pragma.
You mean in a way that would require a _BEGIN and _END macro pair?
Maybe for some compilers. I meant this:
https://docs.oracle.com/cd/E19205-01/819-5267/bkbkr/index.html
Oh...so a function macro would be required...
There's just no point in these compiler-specific workarounds when there's a portable solution.
Except maybe when the 'portable solution' is also a 'hack' (i.e. 'abusing' existing language functionality) and verbose one at that...what may point that we need specific/dedicated language functionality...
Calling conventions macros are probably too specialized to functional libraries, I don't think there's much use for these. I would rather not have them in Boost.Config to avoid spreading their use to other Boost libraries.
That's kind of self-contradicting, if there is a 'danger' of them being used in other libraries that would imply there is a 'danger' from them being useful...
What I mean is that having these macros in Boost.Config might encourage people to use them where they would normally not.
The same as above...I don't see a problem? If they are useful - great, if not and people still use them - 'we have bigger problems'... These may be moved to Functional but that would eventually make many more libraries depend on Functional just for one or two macros...
In any case, I agree that most of those would mostly be used only in functional libraries but for HPC and math libraries especially, the *REG*/'fastcall' conventions are useful when they cannot (e.g. on ABI boundaries) or do not want to rely on the compiler (IPO, LTCG etc.) to automatically choose the optimal/custom calling convention...Admittedly this is mostly useful on targets with 'bad default' conventions, like 32bit x86 and MS x64, but these are still widely used ABIs :/
Non-standard calling conventions give enough headache for users to avoid them as much as possible.
There is no 'standard' calling convention, just the 'default' one...and what headache can a non-default c.convention in an API cause (e.g. the whole Win32 and NativeNT APIs use the non-default stdcall convention)?
You might use them in library internals but there I think it's better to avoid the call at all - by forcing the hot code inline.
Enter bloatware... A statically dispatched call to a 'near' function has near zero overhead for any function with half-a-dozen instructions _if_ it (i.e. the ABI/c.convention) does not force the parameters to ping-pong through the stack... Forceinlining is just a primitive bruteforce method in such cases...which eventually makes things even worse (as this 'bloatware ignoring' way of thinking is certainly a major factor why the dual-core 1GB RAM netbook I'm typing on now slows down to a crawl from paging when I open gmail and 3 more tabs...). For dynamically dispatched calls (virtual functions) choosing the appropriate c.convention and decorating the function with as many relevant attributes is even more important (as the dynamic dispatch is a firewall for the optimiser and it has to assume that the function 'accesses&throws the whole universe')...
Function optimization macros are probably too compiler and case-specific. Your choice of what is considered fast, small code, acceptable math optimizations may not fit others.
If the indisputable goal (definition of 'good codegen') is to have fast and small code/binaries then 'I have to disagree'. For example a library dev can certainly know that certain code will never be part of a hot block (assuming correct usage of the library), for example initialisation, cleanup or error/failure related code and should thus be optimised for size (because that is actually optimising for real world speed - reducing unnecessary bloat - IO and CPU cache thrashing).
If that code is unimportant then why do you care?
Already explained above - precisely because it is unimportant it is important that it be compiled for size (and possibly moved to the 'cold' section of the binary) to minimise its impact on the performance of the code that does matter; loading speed of the binary; virtual memory; disk space, fragmentation and IO...
Simply organizing code into functions properly and using BOOST_LIKELY/UNLIKELY where needed will do the thing.
No it will not (at least not w/o PGO) as the compiler cannot deduce these things (except for simple scenarios like assuming all noreturn functions are cold)...and saying that we can/should then help it with BOOST_LIKELY while arguing that we shouldn't help it with BOOST_COLD/MINSIZE/OPTIMIZE_FOR_* is 'beyond self contradicting'...
Also, things like these should have a very limited use, as the user has to have the ultimate control over the build options.
I'm 'more than all for ultimate control' - as explained above this can actually give more control to the user (+ Boost Build was already a major pain when it came to user control over changing compiler optimisation options in preexisting variants)...
What I was saying is that it's the user who has to decide whether to build your code for size or for speed or for debug. That includes the parts of the code that you, the library author, consider performance critical or otherwise.
I'm sorry I fail to take this as anything else than just pointless nagging for the sake of nagging (and we are not talking about debug builds here). That's tantamount to saying that the user has to decide which parts of my library I'll tweak and optimise and which not... As already explained, properly marking hot and cold parts of code gives the user _more_ freedom to use the more coarse/global compiler options w/o fear that it will negatively impact the library code.
You may want to restrict his range of choices, e.g. when a certain optimization breaks your code.
More strawman 'ivory towering'...how exactly am I restrictring anyones choices? A real world example please?
I guess, you could try to spell these restrictions with these macros, but frankly I doubt it's worth the effort.
Based on what do you doubt? Obviously I am using all this and find it worth the effort...You could throw this generic nay saying just as well on likely, noexcept, restrict, rvalue refs...
I mean, there are so many possibilities on different compilers.
With some you have many possibilities (GCC) with others only a few (Clang, MSVC)...in anycase - what difference does that make, isn't that part of the Boost story - abstraction-on-path-to-standardisation?
One legitimate reason to use these macros that comes to mind is changing the target instruction set for a set of functions that require that (e.g. when a function is optimized for AVX in an application that is supposed to run in the absence of this extension). But then this only seems necessary with gcc, which again makes it a rather specific workaround.
GCC and Clang (and possibly others) but that's a whole different story/set of macros (i.e. unrelated to this)... -- "What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate." Neil Postman --- Ova e-pošta je provjerena na viruse Avast protuvirusnim programom. https://www.avast.com/antivirus
On 2015-12-01 09:18, Domagoj Šarić wrote:
On Wed, 25 Nov 2015 01:50:40 +0530, Andrey Semashev
wrote: As far as I understand, sealed can be used only with C++/CLR, is that right? If so then I'd rather not add a macro for it.
If on the other hand sealed can be used equivalently to final in all contexts, then you could use it to implement BOOST_FINAL.
It is available in C++ but technically it is not _the_ C++ keyword but an extension with the same purpose so 'purists' might mind that BOOST_NO_CXX11_FINAL is not defined even when BOOST_FINAL is defined to sealed instead of final...
Again, if sealed is equivalent to final in all contexts then I don't mind BOOST_FINAL expanding to sealed. Otherwise think of a separate macro for sealed.
I don't see the benefit of BOOST_NOTHROW_LITE.
It's a nothrow attribute that does not insert runtime checks to call std::terminate...and it is unfortunately not offered by Boost.Config...
Do you have measurments of the possible benefits compared to noexcept? I mean, noexcept was advertised as the more efficient version of throw() already.
What more measurements beyond the disassembly window which clearly shows unnecessary EH codegen (i.e. bloat) are necessary?
I'll reiterate, what are the practical benefits? I don't care about a couple instructions there or not there - I will never see them in performance numbers or binary size.
I don't think BOOST_OVERRIDABLE_SYMBOL is a good idea, given that the same effect can be achieved in pure C++.
You mean creating a class template with a single dummy template argument and a static data member just so that you can define a global variable in a header w/o linker errors?
Slightly better:
template< typename T, typename Tag = void > struct singleton { static T instance; }; template< typename T, typename Tag > T singleton< T, Tag >::instance;
That's what I meant...and it is really verbose (and slower to compile than a compiler-specific attribute)...
I won't argue about compilation speeds, although I doubt that the difference (in either favor) is measurable. As for verbosity, the above code needs to be written only once.
Calling conventions macros are probably too specialized to functional libraries, I don't think there's much use for these. I would rather not have them in Boost.Config to avoid spreading their use to other Boost libraries.
That's kind of self-contradicting, if there is a 'danger' of them being used in other libraries that would imply there is a 'danger' from them being useful...
What I mean is that having these macros in Boost.Config might encourage people to use them where they would normally not.
The same as above...I don't see a problem? If they are useful - great, if not and people still use them - 'we have bigger problems'...
[snip]
There is no 'standard' calling convention, just the 'default' one...and what headache can a non-default c.convention in an API cause (e.g. the whole Win32 and NativeNT APIs use the non-default stdcall convention)?
By using non-default calling conventions you're forcing your users out of the standard C++ land. E.g. the user won't be able to store an address of your function without resorting to compiler-specific keywords or macros to specify the calling convention. It complicates integration of your library with other code. I'd rather strictly ban non-default calling conventions on API level at all.
You might use them in library internals but there I think it's better to avoid the call at all - by forcing the hot code inline.
Enter bloatware... A statically dispatched call to a 'near' function has near zero overhead for any function with half-a-dozen instructions _if_ it (i.e. the ABI/c.convention) does not force the parameters to ping-pong through the stack... Forceinlining is just a primitive bruteforce method in such cases...which eventually makes things even worse (as this 'bloatware ignoring' way of thinking is certainly a major factor why the dual-core 1GB RAM netbook I'm typing on now slows down to a crawl from paging when I open gmail and 3 more tabs...).
There are different kinds of bloat. Force-inlining critical functions of your program will hardly make a significant difference on the total binary size, unless used unwisely or you're in hardcore embedded world where every byte counts.
For dynamically dispatched calls (virtual functions) choosing the appropriate c.convention and decorating the function with as many relevant attributes is even more important (as the dynamic dispatch is a firewall for the optimiser and it has to assume that the function 'accesses&throws the whole universe')...
My point was that one should avoid dynamic dispatch in hot code in the first place. Otherwise you're healing a dead horse. Argument passing has little effect compared to a failure to predict the jump target. Even when the target is known statically (i.e. non-virtual function call) the effect of the call can be significant if it's on the hot path - regardless of the calling convention.
If that code is unimportant then why do you care?
Already explained above - precisely because it is unimportant it is important that it be compiled for size (and possibly moved to the 'cold' section of the binary) to minimise its impact on the performance of the code that does matter; loading speed of the binary; virtual memory; disk space, fragmentation and IO...
I think, you're reaching here. Modern OSs don't 'load' binaries, but map them into address space. The pages are loaded on demand, and the typical page size is 4k - you'd have to save at least 4k of code to measure the difference, let alone feel it. Virtual address space is not an issue, unless you're on a 32-bit system, which is only wide spread in the embedded area. The disk space consumption by data exceeds code by magnitudes, which in turn shows on IO, memory and other related stuff. And the net effect of these optimization attributes on a real program is yet to be seen.
Simply organizing code into functions properly and using BOOST_LIKELY/UNLIKELY where needed will do the thing.
No it will not (at least not w/o PGO)
These hints don't require PGO; they work without it.
as the compiler cannot deduce these things (except for simple scenarios like assuming all noreturn functions are cold)...and saying that we can/should then help it with BOOST_LIKELY while arguing that we shouldn't help it with BOOST_COLD/MINSIZE/OPTIMIZE_FOR_* is 'beyond self contradicting'...
The difference is the amount of effort you have to put into it and the resulting portability and effect. The other difference is in the amount of control the user has over the resulting code compilation. This important point you seem to disregard.
What I was saying is that it's the user who has to decide whether to build your code for size or for speed or for debug. That includes the parts of the code that you, the library author, consider performance critical or otherwise.
I'm sorry I fail to take this as anything else than just pointless nagging for the sake of nagging (and we are not talking about debug builds here).
I am talking about debud builds in particular. If I build a debug binary, I want to be able to step through every piece of code, including the ones you marked for speed. If I build for binary size, I want to miminize size of all code, including the one you marked. I don't care for speed in either of these cases.
You may want to restrict his range of choices, e.g. when a certain optimization breaks your code.
More strawman 'ivory towering'...how exactly am I restrictring anyones choices? A real world example please?
Read that quote again, please. For example, if your code relies on strict IEEE 754 you may want to mark the function with -fno-fast-math. Or if your library is broken with LTO on gcc older than 5.1 (like Boost.Log, for instance) you might want to add -fno-lto to your library build scripts. Thing is there are so many things that may potentially break the code, most of which you and I are simply unaware of that this kind of defensive practice just isn't practical.
On Tue, 01 Dec 2015 16:40:33 +0530, Andrey Semashev
On 2015-12-01 09:18, Domagoj Šarić wrote:
On Wed, 25 Nov 2015 01:50:40 +0530, Andrey Semashev
wrote: As far as I understand, sealed can be used only with C++/CLR, is that right? If so then I'd rather not add a macro for it.
If on the other hand sealed can be used equivalently to final in all contexts, then you could use it to implement BOOST_FINAL.
It is available in C++ but technically it is not _the_ C++ keyword but an extension with the same purpose so 'purists' might mind that BOOST_NO_CXX11_FINAL is not defined even when BOOST_FINAL is defined to sealed instead of final...
Again, if sealed is equivalent to final in all contexts then I don't mind BOOST_FINAL expanding to sealed. Otherwise think of a separate macro for sealed.
The question was about BOOST_NO_CXX11_FINAL: "'purists' might mind that BOOST_NO_CXX11_FINAL is not defined even when BOOST_FINAL is defined to sealed instead of final"...
I don't see the benefit of BOOST_NOTHROW_LITE.
It's a nothrow attribute that does not insert runtime checks to call std::terminate...and it is unfortunately not offered by Boost.Config...
Do you have measurments of the possible benefits compared to noexcept? I mean, noexcept was advertised as the more efficient version of throw() already.
What more measurements beyond the disassembly window which clearly shows unnecessary EH codegen (i.e. bloat) are necessary?
I'll reiterate, what are the practical benefits? I don't care about a couple instructions there or not there - I will never see them in performance numbers or binary size.
I guess then you were also against noexcept with the same presumptive 'a couple of instructions (compared to throw())' rationale? What is the N in the "N * a-couple-instructions" expression at which you start to care? What kind of an arugment is that anyway, i.e. why should anyone care that you don't care? How does it relate to whether or not BOOST_NOTHROW should be changed (or at least BOOST_NOTHROW_LITE added) to use the nothrow attribute where available instead of noexcept (especially since the macro itself cannot guarantee C++11 noexcept semantics anyway)? You ask for practical benefits and then give a subjective/off the cuff/question-begging reasoning for dismissing them...You may not mind that the biggest library in Boost is a logging library of all things while some on the other hand would like to see plain C finally retired and C++ (and its standard library) be used (usable) in OS kernels[1] as well as tiniest devices from the darkest corners of the embedded world Some 'wise men' say the free lunch is over... [1] https://channel9.msdn.com/Events/Build/2014/9-015 An example discussion of exactly that - at ~0:17:00 they explicitly mention drivers - I don't know about you but drivers coded with the "I don't care about a couple instructions" mindset don't sound quite exciting (even though most are already even worse than that, nVidia display driver nvlddmkm.sys 12+MB, Realtek audio driver RTKVHD64.sys 4+MB...crazy...)
I don't think BOOST_OVERRIDABLE_SYMBOL is a good idea, given that the same effect can be achieved in pure C++.
You mean creating a class template with a single dummy template argument and a static data member just so that you can define a global variable in a header w/o linker errors?
Slightly better:
template< typename T, typename Tag = void > struct singleton { static T instance; }; template< typename T, typename Tag > T singleton< T, Tag >::instance;
That's what I meant...and it is really verbose (and slower to compile than a compiler-specific attribute)...
I won't argue about compilation speeds, although I doubt that the difference (in either favor) is measurable. As for verbosity, the above code needs to be written only once.
But what if you want a 'proper' name for the global variable? You have to name the tag type and then create some inline function named-like-the-desired variable that will return the singleton<Tag>::instance...+ this does not work for static member variables or functions... All compilers are already forced to implement such an attribute internally precisely to support code such as you wrote above - so this just asks that this be standardized and made public....
Calling conventions macros are probably too specialized to functional libraries, I don't think there's much use for these. I would rather not have them in Boost.Config to avoid spreading their use to other Boost libraries.
That's kind of self-contradicting, if there is a 'danger' of them being used in other libraries that would imply there is a 'danger' from them being useful...
What I mean is that having these macros in Boost.Config might encourage people to use them where they would normally not.
The same as above...I don't see a problem? If they are useful - great, if not and people still use them - 'we have bigger problems'...
[snip]
There is no 'standard' calling convention, just the 'default' one...and what headache can a non-default c.convention in an API cause (e.g. the whole Win32 and NativeNT APIs use the non-default stdcall convention)?
By using non-default calling conventions you're forcing your users out of the standard C++ land. E.g. the user won't be able to store an address of your function without resorting to compiler-specific keywords or macros to specify the calling convention. It complicates integration of your library with other code. I'd rather strictly ban non-default calling conventions on API level at all.
* no compiler-specific keywords just a documented macro already used by the API in question * the macro is only needed if you need to declare the pointer/function type yourself (instead of just passing the function address to an API, using auto, decltype, lambdas or template type deduction or wrapping it in something like std::function, signal/slot object...) * explicit calling conventions in (cross platform) public APIs of libraries and even OSs are a pretty common thing in my experience * "forcing users out of the standard C++ land" - that's just moot i.e. isn't that part of what Boost is about? i.e. there is nothing stopping 'us' from standardizing the concept of calling conventions (e.g. to specify/handle the different architecture/ABI intricacies of 'evolving hardware' - soft/hard float, different GPR file sizes, 'levels' of SIMD units etc.) * finally - you cannot just decide this from personal preference for all people and all libraries, i.e. IMO that's up to individual libraries (their devs and users) to decide - e.g. HPC, math, DSP, etc. libs should be free to decide that the performance benefits of explicit/specialized c.conventions outweigh the more than rare problem of a little more verbose function pointer types...
You might use them in library internals but there I think it's better to avoid the call at all - by forcing the hot code inline.
Enter bloatware... A statically dispatched call to a 'near' function has near zero overhead for any function with half-a-dozen instructions _if_ it (i.e. the ABI/c.convention) does not force the parameters to ping-pong through the stack... Forceinlining is just a primitive bruteforce method in such cases...which eventually makes things even worse (as this 'bloatware ignoring' way of thinking is certainly a major factor why the dual-core 1GB RAM netbook I'm typing on now slows down to a crawl from paging when I open gmail and 3 more tabs...).
There are different kinds of bloat. Force-inlining critical functions of your program will hardly make a significant difference on the total binary size, unless used unwisely or you're in hardcore embedded world where every byte counts.
This assumption can only be true if the 'critical functions of a program' (force-inlined into every callsite!) comprise a non-significant portion of the entire program...which is in direct contradiction with presumptions you make elsewhere - such as that properly marking cold portions of code is just not worth it... Suddenly you are OK with "restricting users" (those in the 'hardcore embedded world') as well as having/using keywords/macros (forceinline) that can be used 'wisely' and 'unwisely'?... + sometimes, still, even with everything inlined, compilers still cannot handle even simpler C++ abstractions 'all by them selves' https://gist.github.com/rygorous/c6831e60f5366569d2e9
For dynamically dispatched calls (virtual functions) choosing the appropriate c.convention and decorating the function with as many relevant attributes is even more important (as the dynamic dispatch is a firewall for the optimiser and it has to assume that the function 'accesses&throws the whole universe')...
My point was that one should avoid dynamic dispatch in hot code in the first place.
AFAICT I first mentioned dynamically dispatched calls.
Otherwise you're healing a dead horse. Argument passing has little effect compared to a failure to predict the jump target.
Bald assertions (in addition to ignoring of parts of what I said): * take a look @ https://channel9.msdn.com/Events/GoingNative/2013/Compiler-Confidential ~19:15 on the mind-boggling mind-reading (branch prediction) capabilities of modern CPUs + what I already said: it is not just about the direct speed impact but about the detrimental impact on the (optimisation) of code surrounding the callsites (creating bigger and slower code)...some attributes (like noalias, pure and const) can even allow a compiler to hoist a virtual call outside a loop...
Even when the target is known statically (i.e. non-virtual function call) the effect of the call can be significant if it's on the hot path - regardless of the calling convention.
A static call to a (cached/prefetched) function that does not touch the stack has pretty much the overhead of two simple instructions CALL and RET (and CPUs have had dedicated circuitry, RSBs, for exactly that for ages). Please give me an example of a function not automatically inlined (even at Os levels) where this is a 'significant effect' (moreover even if you could, that still wouldn't prove your point - all that is needed to disprove it is the existence of a function whose call overhead is made insignificant by using a better c.convention and appropriate attributes - trivial)...
If that code is unimportant then why do you care?
Already explained above - precisely because it is unimportant it is important that it be compiled for size (and possibly moved to the 'cold' section of the binary) to minimise its impact on the performance of the code that does matter; loading speed of the binary; virtual memory; disk space, fragmentation and IO...
I think, you're reaching here. Modern OSs don't 'load' binaries, but map them into address space. The pages are loaded on demand,
You don't say...obviously that's exactly what I meant. The pages have to be loaded eventually (otherwise my computer would "just map the OS into address space" when I turn it on ::roll eyes::) and if your binaries are just lazily 'completely compiled for speed' (w/o PGO) then the 'important' and 'unimportant' parts of your code will be interspersed (i.e. the 'unimportant' bits will have to be loaded along with the 'important' ones). Especially true with today's compilers which additionally try to 'fix lazy programming' with autovectorization, autoparallelization, loop unrolling, transformation...this not only explodes codesize but also slows down release builds - so properly marking 'unimportant' code has the additional benefit of faster builds.
and the typical page size is 4k - you'd have to save at least 4k of code to measure the difference, let alone feel it.
Let's try and see how hard it is to save something with Boost.Log: - VS2015 U1 - combined sizes of both library DLLs - bjam variant=release link=shared runtime-link=shared address-model=32 optimization=%1 cxxflags="/Ox%2 /GL /GS- /Gy" where (%1 = space, %2 = s) and (%1 = speed, %2 = t) for for-size and for-speed optimised builds, repsectively: * for-speed build: 959kB * for-size build: 730kB -> that's a delta of 229kB or 31% (the relative difference is much larger if we compare only the text/code sections of the DLLs because of all the strings, RTTI records etc...) And according to your own assumption that hot code is insignificant in size, it follows that you can shave off 30% from Boost.Log exactly by having non-hot code compiled for size...
Virtual address space is not an issue, unless you're on a 32-bit system, which is only wide spread in the embedded area.
"Restricting user choices" and "reaching" perhaps? (ARMv7) mobile phone owners should just shut up? http://www.fool.com/investing/general/2013/01/19/mobile-overtakes-desktops-i... https://en.wikipedia.org/wiki/Usage_share_of_operating_systems http://searchenginewatch.com/sew/opinion/2353616/mobile-now-exceeds-pc-the-b...
The disk space consumption by data exceeds code by magnitudes, which in turn shows on IO, memory and other related stuff.
If you say so, like for example: * CMake 3.4.1 Win32 build: - ~40MB total size - of that ~24MB are binaries and the rest is mostly _documentation_ (i.e. not program data) - cmake-gui.exe, a single dialog application * Git4Windows 2.6.3 Win64 build: - ~415MB (!?) total size - of that ~345MB are binaries * or things like Windows and Visual Studio which show even narrower ratios but on the gigabyte scale... So, when you wait for Windows, Visual Studio, Android Studio, Eclipse, Adobe Acrobat, Photoshop.....to "map into memory" on i7 machines with RAID0 and/or SSD drives that's because of "data"? This practical application of the 'premature optimization is the root of all evil' fallacy[1] came to one of its most (un)funny logical, reductio ad absurdum, outcomes some years back when Adobe's Acrobat Reader became so fat and slow that they decided they had to do something about it. Of course, making it efficient was not an option (it would go against the dogma) so they created the classical "fast loader daemon" (that holds key parts of Acrobat Reader always in memory so that it would start faster) - and the joke was that FoxitReader came to the scene, it could read PDFs just like Acrobat Reader but was smaller than even the "fast loader daemon" of the latter - i.e. it was what Acrobat Reader was supposed to be "prematurely optimized" to from the beginning... Or what about the move to SSDs, where you keep you programs on the fast&expensive SSDs and "data" on conventional disks? [1] which should read 'premature optimization _for speed_ is the root of all evil' (i.e. 'always optimise for size, and for speed only where and when needed')
And the net effect of these optimization attributes on a real program is yet to be seen.
So, all the compiler vendors added those (or the even more complex things like PGO or App Thining) simply because they had nothing better to do? No app is an island (except maybe desktop games:) - you may think that it does not matter that you chose Qt (or some similar bloatware) to implement your single-dialog application which mostly sits in the tray because you noticed no delays "on modern hardware" - a user whose startup time is prolonged by N seconds because his systray is filled by a dozen such "oh it doesn't matter" utilities may feel quite differently...
Simply organizing code into functions properly and using BOOST_LIKELY/UNLIKELY where needed will do the thing.
No it will not (at least not w/o PGO)
These hints don't require PGO; they work without it.
Neither did I say they do - merely that those hints will _not_ do the thing (they simply do not serve that purpose - they are function-internal hints) while "organizing code into functions properly" _with_ PGO should do the thing (at least to a significant degree)...
as the compiler cannot deduce these things (except for simple scenarios like assuming all noreturn functions are cold)...and saying that we can/should then help it with BOOST_LIKELY while arguing that we shouldn't help it with BOOST_COLD/MINSIZE/OPTIMIZE_FOR_* is 'beyond self contradicting'...
The difference is the amount of effort you have to put into it and the resulting portability and effect.
Huh? The effort is less, portability the same (macros?) and the effect is better (simply because those are better tools for the job)??
The other difference is in the amount of control the user has over the resulting code compilation. This important point you seem to disregard.
I already answered this point: * from my perspective (and experience), libs that correctly 'mark' their code give _me_ more freedom with _my_ code (and compiler options) w/o fear of detrimental effect on their code * I can't think of a situation where one would want to optimise cold code for-speed - it would only make the whole thing run slower or same at best * only places where one would want even hot paths optimised for size are things like 'ultrahardcore embedded targets' (where the restraints are in kilobytes), bootloaders and/or 4kB/64kB demo competitions: - these are too specific to cripple all others because of them - usually already use custom solutions (even for the most basic things like the C runtime) - the problem (if we can call it that and if it really exists) is only with the hot/optimise-for-speed hints - and these can easily be made 'disableable' (i.e. redefined to nothing) * finally, why this objection is bogus is that you could by the same rationale ask that the user has to have control over how much effort a lib dev puts into optimising various parts of the library...
What I was saying is that it's the user who has to decide whether to build your code for size or for speed or for debug. That includes the parts of the code that you, the library author, consider performance critical or otherwise.
I'm sorry I fail to take this as anything else than just pointless nagging for the sake of nagging (and we are not talking about debug builds here).
I am talking about debud builds in particular. If I build a debug binary, I want to be able to step through every piece of code, including the ones you marked for speed. If I build for binary size, I want to miminize size of all code, including the one you marked. I don't care for speed in either of these cases.
Debug builds are a red herring - per function attributes like hot, cold, optsize... do not affect debug builds or debug information. Pragmas that mark whole blocks of code might affect them (depending on the compiler and particular pragma/macro) but it is trivial to have those defined empty in debug builds... A similar thing holds for opt-for-size builds (besides all things already said regarding that above): the per function attributes should have no effect there.
For example, if your code relies on strict IEEE 754 you may want to mark the function with -fno-fast-math.
Isn't that exactly (part of) the argument I put out for the fastmath macros?
You may want to restrict his range of choices, e.g. when a certain optimization breaks your code.
More strawman 'ivory towering'...how exactly am I restrictring anyones choices? A real world example please?
Read that quote again, please. .. Or if your library is broken with LTO on gcc older than 5.1 (like Boost.Log, for instance) you might want to add -fno-lto to your library build scripts. .. Thing is there are so many things that may potentially break the code, most of which you and I are simply unaware of that this kind of defensive practice just isn't practical.
That's a completely different story (compiler codegen bugs)... -- C++ >In The Kernel< Now!
On Wednesday, December 16, 2015 12:02:26 AM Domagoj Saric wrote:
On Tue, 01 Dec 2015 16:40:33 +0530, Andrey Semashev
The question was about BOOST_NO_CXX11_FINAL: "'purists' might mind that BOOST_NO_CXX11_FINAL is not defined even when BOOST_FINAL is defined to sealed instead of final"...
I'm not seeing those 'purists' in this discussion. And I'm not getting an impression that you're one of them. As far as I'm concerned, it doesn't matter how BOOST_FINAL is implemented as long as it conforms to the documented behavior (which, I assume, would be equivalent to the behavior of C++11 final).
I don't see the benefit of BOOST_NOTHROW_LITE.
It's a nothrow attribute that does not insert runtime checks to call std::terminate...and it is unfortunately not offered by Boost.Config...
Do you have measurments of the possible benefits compared to noexcept? I mean, noexcept was advertised as the more efficient version of throw() already.
What more measurements beyond the disassembly window which clearly shows unnecessary EH codegen (i.e. bloat) are necessary?
I'll reiterate, what are the practical benefits? I don't care about a couple instructions there or not there - I will never see them in performance numbers or binary size.
I guess then you were also against noexcept with the same presumptive 'a couple of instructions (compared to throw())' rationale? What is the N in the "N * a-couple-instructions" expression at which you start to care?
That's just handwaving and I was interested in some practical evidence that BOOST_NOTHROW_LITE would be beneficial compared to noexcept. You haven't presented any so far.
What kind of an arugment is that anyway, i.e. why should anyone care that you don't care?
Well, you asked for community opinion and I expressed mine. If you're not interested in it then say so and I won't waste everyone's time. This remark relates to the tone of the rest of your reply.
How does it relate to whether or not BOOST_NOTHROW should be changed (or at least BOOST_NOTHROW_LITE added) to use the nothrow attribute where available instead of noexcept (especially since the macro itself cannot guarantee C++11 noexcept semantics anyway)?
First, there is no BOOST_NOTHROW macro currently. There is BOOST_NOEXCEPT, and it does correspond to C++11 noexcept when it is supported by the compiler. It does not emulate noexcept semantics in C++03, but that was never implied. Second, there is BOOST_NOEXCEPT_OR_NOTHROW and it does switch between throw() in C++03 and noexcept in C++11. I'm not sure if you mean this macro by BOOST_NOTHROW but I don't think changing it to __attribute__((nothrow)) is a good idea because this is a breaking change (both in behavior and compilation). Third, I did not propose to change semantics of the existing macros, nor commented in relation to this idea. I'm sceptical about introducing a new one, BOOST_NOTHROW_LITE.
You ask for practical benefits and then give a subjective/off the cuff/question-begging reasoning for dismissing them...You may not mind that the biggest library in Boost is a logging library of all things while some on the other hand would like to see plain C finally retired and C++ (and its standard library) be used (usable) in OS kernels[1] as well as tiniest devices from the darkest corners of the embedded world Some 'wise men' say the free lunch is over...
I'm not sure what you're getting at here.
[1] https://channel9.msdn.com/Events/Build/2014/9-015 An example discussion of exactly that - at ~0:17:00 they explicitly mention drivers - I don't know about you but drivers coded with the "I don't care about a couple instructions" mindset don't sound quite exciting (even though most are already even worse than that, nVidia display driver nvlddmkm.sys 12+MB, Realtek audio driver RTKVHD64.sys 4+MB...crazy...)
Given the complexity of modern hardware, I don't find these sizes crazy at all. That said, I have no experience in driver development, so maybe some part of craziness is lost on me. And by the way, I don't think those sizes are relevant anyway as drivers are most likely written in C and not C++. Not in C+ + with exceptions anyway.
I don't think BOOST_OVERRIDABLE_SYMBOL is a good idea, given that the same effect can be achieved in pure C++.
[snip]
But what if you want a 'proper' name for the global variable? You have to name the tag type and then create some inline function named-like-the-desired variable that will return the singleton<Tag>::instance...
I don't see a problem with that.
+ this does not work for static member variables or functions...
inline works for functions.
All compilers are already forced to implement such an attribute internally precisely to support code such as you wrote above - so this just asks that this be standardized and made public....
They have to implement it internally, but not necessarilly make it public. You will have to write the portable code anyway, so what's the point in compiler- specific versions?
By using non-default calling conventions you're forcing your users out of the standard C++ land. E.g. the user won't be able to store an address of your function without resorting to compiler-specific keywords or macros to specify the calling convention. It complicates integration of your library with other code. I'd rather strictly ban non-default calling conventions on API level at all.
* no compiler-specific keywords just a documented macro already used by the API in question * the macro is only needed if you need to declare the pointer/function type yourself (instead of just passing the function address to an API, using auto, decltype, lambdas or template type deduction or wrapping it in something like std::function, signal/slot object...)
Not only that. As calling convention affects type, it also affects template specialization matching and overload resolution. Have a look at boost::mem_fn implementation for an example of overloads explosion caused by that. Thanks but no.
* explicit calling conventions in (cross platform) public APIs of libraries and even OSs are a pretty common thing in my experience
This is common on Windows. I'd say, it's an exception rather than a rule, as I don't remember any POSIX-like system exposing its system API with a non- standard calling convention. As for libraries, I can't remember when I last saw an explicit calling convention in the API.
* "forcing users out of the standard C++ land" - that's just moot i.e. isn't that part of what Boost is about?
It's the opposite of what Boost is about (to my understanding). Boost makes non-standard and complicated things easy and in the spirit of the standard C+ +. Imposing calling conventions on users is anything but that.
i.e. there is nothing stopping 'us' from standardizing the concept of calling conventions (e.g. to specify/handle the different architecture/ABI intricacies of 'evolving hardware' - soft/hard float, different GPR file sizes, 'levels' of SIMD units etc.)
ABI specs exist for that. Calling conventions are simply semi-legal deviations from the spec. While they may provide local benefits, I'm opposed to their creep into API level.
There are different kinds of bloat. Force-inlining critical functions of your program will hardly make a significant difference on the total binary size, unless used unwisely or you're in hardcore embedded world where every byte counts.
This assumption can only be true if the 'critical functions of a program' (force-inlined into every callsite!) comprise a non-significant portion of the entire program...
That is normally the case - the critical part of the program is typically much smaller than the whole program.
which is in direct contradiction with presumptions you make elsewhere - such as that properly marking cold portions of code is just not worth it...
I don't see the contradiction.
Suddenly you are OK with "restricting users" (those in the 'hardcore embedded world') as well as having/using keywords/macros (forceinline) that can be used 'wisely' and 'unwisely'?...
Forcing inline in general purpose libraries like Boost can be beneficial or detrimental, that is obvious. Finding a good balance is what makes the use of this feature wise. The balance will not be perfect for all environments - just as your choice of calling conventions or optimization flags.
+ sometimes, still, even with everything inlined, compilers still cannot handle even simpler C++ abstractions 'all by them selves' https://gist.github.com/rygorous/c6831e60f5366569d2e9
Not sure what that's supposed to illustrate.
For dynamically dispatched calls (virtual functions) choosing the appropriate c.convention and decorating the function with as many relevant attributes is even more important (as the dynamic dispatch is a firewall for the optimiser and it has to assume that the function 'accesses&throws the whole universe')...
My point was that one should avoid dynamic dispatch in hot code in the first place.
AFAICT I first mentioned dynamically dispatched calls.
Umm, so? Not sure I understand.
+ what I already said: it is not just about the direct speed impact but about the detrimental impact on the (optimisation) of code surrounding the callsites (creating bigger and slower code)...some attributes (like noalias, pure and const) can even allow a compiler to hoist a virtual call outside a loop...
If a function call can be moved outside of the loop, then why is it inside the loop? Especially, if you know the piece of code is performance-critical and the function is virtual?
Even when the target is known statically (i.e. non-virtual function call) the effect of the call can be significant if it's on the hot path - regardless of the calling convention.
A static call to a (cached/prefetched) function that does not touch the stack has pretty much the overhead of two simple instructions CALL and RET (and CPUs have had dedicated circuitry, RSBs, for exactly that for ages).
Also the prologue and epilogue, unless the function is really trivial, at which point it can probably be inlined. The function call, as I'm sure you know, involves writing the return address to stack anyway. And if the function has external linkage the call will likely be made through a symbol table anyway. That increases the pressure on the TLB, which may affect the performcance of your performance-critical function if it is memory-intensive and makes a few calls itself.
Please give me an example of a function not automatically inlined (even at Os levels) where this is a 'significant effect' (moreover even if you could, that still wouldn't prove your point - all that is needed to disprove it is the existence of a function whose call overhead is made insignificant by using a better c.convention and appropriate attributes - trivial)...
My experience in this area mostly comes from image processing algorithms, like scaling or color transform, for example. Each pixel (or a vector thereof) may have to be processed in a rather complex way, such that the functions that implement this often do not inline even at -O3. I experimented with various calling conventions, including __attribute__((regparm)), but eventually forcing inline gave the best results.
and the typical page size is 4k - you'd have to save at least 4k of code to measure the difference, let alone feel it.
Let's try and see how hard it is to save something with Boost.Log: - VS2015 U1 - combined sizes of both library DLLs - bjam variant=release link=shared runtime-link=shared address-model=32 optimization=%1 cxxflags="/Ox%2 /GL /GS- /Gy" where (%1 = space, %2 = s) and (%1 = speed, %2 = t) for for-size and for-speed optimised builds, repsectively: * for-speed build: 959kB * for-size build: 730kB -> that's a delta of 229kB or 31% (the relative difference is much larger if we compare only the text/code sections of the DLLs because of all the strings, RTTI records etc...) And according to your own assumption that hot code is insignificant in size, it follows that you can shave off 30% from Boost.Log exactly by having non-hot code compiled for size...
Thing is, I don't know what part of Boost.Log will be hot or cold in the actual application. (I mean, I could guess that some particular parts are most likely going to be cold, but I can't make any guesses about the rest of the code because its use is dependent on what the application uses of the library). Now let's assume I marked some parts hot and others cold - what happens when my guess is incorrect? Right, some cold parts are loaded, along with the really-cold parts, and some hot parts are not loaded. Back to square one. You could argue that you know for certain what parts will be hot in your library. Fair enough, such markup could be useful for you.
The disk space consumption by data exceeds code by magnitudes, which in turn shows on IO, memory and other related stuff.
If you say so, like for example: * CMake 3.4.1 Win32 build: - ~40MB total size - of that ~24MB are binaries and the rest is mostly _documentation_ (i.e. not program data) - cmake-gui.exe, a single dialog application * Git4Windows 2.6.3 Win64 build: - ~415MB (!?) total size - of that ~345MB are binaries
Not sure what you've downloaded, but the one I've found weighs about 29MB. https://git-scm.com/download/win Also, these numbers should be taken with a big grain of salt, as we don't know how much the actual code there is in the binaries. Often there is debug info or other data embedded into the binaries. Another source of code bloat is statically linked libraries. The point is that if you're fighting with code bloat, there are other areas you should first look into before you think about fine-tuning compiler options on per-function basis.
So, when you wait for Windows, Visual Studio, Android Studio, Eclipse, Adobe Acrobat, Photoshop.....to "map into memory" on i7 machines with RAID0 and/or SSD drives that's because of "data"?
There are many factors to performance. And mapping executables into memory, I'm sure, is by far not the most significant one of them.
as the compiler cannot deduce these things (except for simple scenarios like assuming all noreturn functions are cold)...and saying that we can/should then help it with BOOST_LIKELY while arguing that we shouldn't help it with BOOST_COLD/MINSIZE/OPTIMIZE_FOR_* is 'beyond self contradicting'...
The difference is the amount of effort you have to put into it and the resulting portability and effect.
Huh? The effort is less, portability the same (macros?) and the effect is better (simply because those are better tools for the job)??
Earlier I said that I don't think that the general OPTIMIZE_FOR_SPEED/ OPTIMIZE_FOR_SIZE/etc. macros will work for everyone and everywhere. And having to specify compiler options manually hampers portability and increases maintenance effort.
I am talking about debud builds in particular. If I build a debug binary, I want to be able to step through every piece of code, including the ones you marked for speed. If I build for binary size, I want to miminize size of all code, including the one you marked. I don't care for speed in either of these cases.
Debug builds are a red herring - per function attributes like hot, cold, optsize... do not affect debug builds or debug information.
That's unexpected. Not the debug information, but the ability to step in the debugger through code, with data introspection, is directly affected by optimization levels. I don't believe that somehow magically specifying -O3 in code would provide a better debuggable binary than specifying -O3 in the compiler command line.
Sorry about necrobumping, but I've ignored this thread previously as the
subject line didn't make it obvious this thread was of my interest.
2015-11-24 17:20 GMT-03:00 Andrey Semashev
I don't think for is a good idea, given that the
same effect can be achieved in pure C++.
You mean creating a class template with a single dummy template argument and a static data member just so that you can define a global variable in a header w/o linker errors?
Slightly better:
template< typename T, typename Tag = void > struct singleton { static T instance; }; template< typename T, typename Tag > T singleton< T, Tag >::instance;
I was looking for something similar to BOOST_OVERRIDABLE_SYMBOL as I want my library header only and Boost.System requires the category to be a single static object and it'll even use address comparison to test if the categories are the same: https://github.com/boostorg/system/blob/388b3497af4b205ff7e8c67ea306f57eea62... Is it guaranteed the above code will return the same object even if this code was included in different libraries used by the same program? -- Vinícius dos Santos Oliveira https://vinipsmaker.github.io/
participants (8)
-
Agustín K-ballo Bergé
-
Alexander Lauser
-
Andrey Semashev
-
Domagoj Saric
-
Domagoj Šarić
-
Gavin Lambert
-
Joel FALCOU
-
Vinícius dos Santos Oliveira