On Wednesday, December 16, 2015 12:02:26 AM Domagoj Saric wrote:
On Tue, 01 Dec 2015 16:40:33 +0530, Andrey Semashev
The question was about BOOST_NO_CXX11_FINAL: "'purists' might mind that BOOST_NO_CXX11_FINAL is not defined even when BOOST_FINAL is defined to sealed instead of final"...
I'm not seeing those 'purists' in this discussion. And I'm not getting an impression that you're one of them. As far as I'm concerned, it doesn't matter how BOOST_FINAL is implemented as long as it conforms to the documented behavior (which, I assume, would be equivalent to the behavior of C++11 final).
I don't see the benefit of BOOST_NOTHROW_LITE.
It's a nothrow attribute that does not insert runtime checks to call std::terminate...and it is unfortunately not offered by Boost.Config...
Do you have measurments of the possible benefits compared to noexcept? I mean, noexcept was advertised as the more efficient version of throw() already.
What more measurements beyond the disassembly window which clearly shows unnecessary EH codegen (i.e. bloat) are necessary?
I'll reiterate, what are the practical benefits? I don't care about a couple instructions there or not there - I will never see them in performance numbers or binary size.
I guess then you were also against noexcept with the same presumptive 'a couple of instructions (compared to throw())' rationale? What is the N in the "N * a-couple-instructions" expression at which you start to care?
That's just handwaving and I was interested in some practical evidence that BOOST_NOTHROW_LITE would be beneficial compared to noexcept. You haven't presented any so far.
What kind of an arugment is that anyway, i.e. why should anyone care that you don't care?
Well, you asked for community opinion and I expressed mine. If you're not interested in it then say so and I won't waste everyone's time. This remark relates to the tone of the rest of your reply.
How does it relate to whether or not BOOST_NOTHROW should be changed (or at least BOOST_NOTHROW_LITE added) to use the nothrow attribute where available instead of noexcept (especially since the macro itself cannot guarantee C++11 noexcept semantics anyway)?
First, there is no BOOST_NOTHROW macro currently. There is BOOST_NOEXCEPT, and it does correspond to C++11 noexcept when it is supported by the compiler. It does not emulate noexcept semantics in C++03, but that was never implied. Second, there is BOOST_NOEXCEPT_OR_NOTHROW and it does switch between throw() in C++03 and noexcept in C++11. I'm not sure if you mean this macro by BOOST_NOTHROW but I don't think changing it to __attribute__((nothrow)) is a good idea because this is a breaking change (both in behavior and compilation). Third, I did not propose to change semantics of the existing macros, nor commented in relation to this idea. I'm sceptical about introducing a new one, BOOST_NOTHROW_LITE.
You ask for practical benefits and then give a subjective/off the cuff/question-begging reasoning for dismissing them...You may not mind that the biggest library in Boost is a logging library of all things while some on the other hand would like to see plain C finally retired and C++ (and its standard library) be used (usable) in OS kernels[1] as well as tiniest devices from the darkest corners of the embedded world Some 'wise men' say the free lunch is over...
I'm not sure what you're getting at here.
[1] https://channel9.msdn.com/Events/Build/2014/9-015 An example discussion of exactly that - at ~0:17:00 they explicitly mention drivers - I don't know about you but drivers coded with the "I don't care about a couple instructions" mindset don't sound quite exciting (even though most are already even worse than that, nVidia display driver nvlddmkm.sys 12+MB, Realtek audio driver RTKVHD64.sys 4+MB...crazy...)
Given the complexity of modern hardware, I don't find these sizes crazy at all. That said, I have no experience in driver development, so maybe some part of craziness is lost on me. And by the way, I don't think those sizes are relevant anyway as drivers are most likely written in C and not C++. Not in C+ + with exceptions anyway.
I don't think BOOST_OVERRIDABLE_SYMBOL is a good idea, given that the same effect can be achieved in pure C++.
[snip]
But what if you want a 'proper' name for the global variable? You have to name the tag type and then create some inline function named-like-the-desired variable that will return the singleton<Tag>::instance...
I don't see a problem with that.
+ this does not work for static member variables or functions...
inline works for functions.
All compilers are already forced to implement such an attribute internally precisely to support code such as you wrote above - so this just asks that this be standardized and made public....
They have to implement it internally, but not necessarilly make it public. You will have to write the portable code anyway, so what's the point in compiler- specific versions?
By using non-default calling conventions you're forcing your users out of the standard C++ land. E.g. the user won't be able to store an address of your function without resorting to compiler-specific keywords or macros to specify the calling convention. It complicates integration of your library with other code. I'd rather strictly ban non-default calling conventions on API level at all.
* no compiler-specific keywords just a documented macro already used by the API in question * the macro is only needed if you need to declare the pointer/function type yourself (instead of just passing the function address to an API, using auto, decltype, lambdas or template type deduction or wrapping it in something like std::function, signal/slot object...)
Not only that. As calling convention affects type, it also affects template specialization matching and overload resolution. Have a look at boost::mem_fn implementation for an example of overloads explosion caused by that. Thanks but no.
* explicit calling conventions in (cross platform) public APIs of libraries and even OSs are a pretty common thing in my experience
This is common on Windows. I'd say, it's an exception rather than a rule, as I don't remember any POSIX-like system exposing its system API with a non- standard calling convention. As for libraries, I can't remember when I last saw an explicit calling convention in the API.
* "forcing users out of the standard C++ land" - that's just moot i.e. isn't that part of what Boost is about?
It's the opposite of what Boost is about (to my understanding). Boost makes non-standard and complicated things easy and in the spirit of the standard C+ +. Imposing calling conventions on users is anything but that.
i.e. there is nothing stopping 'us' from standardizing the concept of calling conventions (e.g. to specify/handle the different architecture/ABI intricacies of 'evolving hardware' - soft/hard float, different GPR file sizes, 'levels' of SIMD units etc.)
ABI specs exist for that. Calling conventions are simply semi-legal deviations from the spec. While they may provide local benefits, I'm opposed to their creep into API level.
There are different kinds of bloat. Force-inlining critical functions of your program will hardly make a significant difference on the total binary size, unless used unwisely or you're in hardcore embedded world where every byte counts.
This assumption can only be true if the 'critical functions of a program' (force-inlined into every callsite!) comprise a non-significant portion of the entire program...
That is normally the case - the critical part of the program is typically much smaller than the whole program.
which is in direct contradiction with presumptions you make elsewhere - such as that properly marking cold portions of code is just not worth it...
I don't see the contradiction.
Suddenly you are OK with "restricting users" (those in the 'hardcore embedded world') as well as having/using keywords/macros (forceinline) that can be used 'wisely' and 'unwisely'?...
Forcing inline in general purpose libraries like Boost can be beneficial or detrimental, that is obvious. Finding a good balance is what makes the use of this feature wise. The balance will not be perfect for all environments - just as your choice of calling conventions or optimization flags.
+ sometimes, still, even with everything inlined, compilers still cannot handle even simpler C++ abstractions 'all by them selves' https://gist.github.com/rygorous/c6831e60f5366569d2e9
Not sure what that's supposed to illustrate.
For dynamically dispatched calls (virtual functions) choosing the appropriate c.convention and decorating the function with as many relevant attributes is even more important (as the dynamic dispatch is a firewall for the optimiser and it has to assume that the function 'accesses&throws the whole universe')...
My point was that one should avoid dynamic dispatch in hot code in the first place.
AFAICT I first mentioned dynamically dispatched calls.
Umm, so? Not sure I understand.
+ what I already said: it is not just about the direct speed impact but about the detrimental impact on the (optimisation) of code surrounding the callsites (creating bigger and slower code)...some attributes (like noalias, pure and const) can even allow a compiler to hoist a virtual call outside a loop...
If a function call can be moved outside of the loop, then why is it inside the loop? Especially, if you know the piece of code is performance-critical and the function is virtual?
Even when the target is known statically (i.e. non-virtual function call) the effect of the call can be significant if it's on the hot path - regardless of the calling convention.
A static call to a (cached/prefetched) function that does not touch the stack has pretty much the overhead of two simple instructions CALL and RET (and CPUs have had dedicated circuitry, RSBs, for exactly that for ages).
Also the prologue and epilogue, unless the function is really trivial, at which point it can probably be inlined. The function call, as I'm sure you know, involves writing the return address to stack anyway. And if the function has external linkage the call will likely be made through a symbol table anyway. That increases the pressure on the TLB, which may affect the performcance of your performance-critical function if it is memory-intensive and makes a few calls itself.
Please give me an example of a function not automatically inlined (even at Os levels) where this is a 'significant effect' (moreover even if you could, that still wouldn't prove your point - all that is needed to disprove it is the existence of a function whose call overhead is made insignificant by using a better c.convention and appropriate attributes - trivial)...
My experience in this area mostly comes from image processing algorithms, like scaling or color transform, for example. Each pixel (or a vector thereof) may have to be processed in a rather complex way, such that the functions that implement this often do not inline even at -O3. I experimented with various calling conventions, including __attribute__((regparm)), but eventually forcing inline gave the best results.
and the typical page size is 4k - you'd have to save at least 4k of code to measure the difference, let alone feel it.
Let's try and see how hard it is to save something with Boost.Log: - VS2015 U1 - combined sizes of both library DLLs - bjam variant=release link=shared runtime-link=shared address-model=32 optimization=%1 cxxflags="/Ox%2 /GL /GS- /Gy" where (%1 = space, %2 = s) and (%1 = speed, %2 = t) for for-size and for-speed optimised builds, repsectively: * for-speed build: 959kB * for-size build: 730kB -> that's a delta of 229kB or 31% (the relative difference is much larger if we compare only the text/code sections of the DLLs because of all the strings, RTTI records etc...) And according to your own assumption that hot code is insignificant in size, it follows that you can shave off 30% from Boost.Log exactly by having non-hot code compiled for size...
Thing is, I don't know what part of Boost.Log will be hot or cold in the actual application. (I mean, I could guess that some particular parts are most likely going to be cold, but I can't make any guesses about the rest of the code because its use is dependent on what the application uses of the library). Now let's assume I marked some parts hot and others cold - what happens when my guess is incorrect? Right, some cold parts are loaded, along with the really-cold parts, and some hot parts are not loaded. Back to square one. You could argue that you know for certain what parts will be hot in your library. Fair enough, such markup could be useful for you.
The disk space consumption by data exceeds code by magnitudes, which in turn shows on IO, memory and other related stuff.
If you say so, like for example: * CMake 3.4.1 Win32 build: - ~40MB total size - of that ~24MB are binaries and the rest is mostly _documentation_ (i.e. not program data) - cmake-gui.exe, a single dialog application * Git4Windows 2.6.3 Win64 build: - ~415MB (!?) total size - of that ~345MB are binaries
Not sure what you've downloaded, but the one I've found weighs about 29MB. https://git-scm.com/download/win Also, these numbers should be taken with a big grain of salt, as we don't know how much the actual code there is in the binaries. Often there is debug info or other data embedded into the binaries. Another source of code bloat is statically linked libraries. The point is that if you're fighting with code bloat, there are other areas you should first look into before you think about fine-tuning compiler options on per-function basis.
So, when you wait for Windows, Visual Studio, Android Studio, Eclipse, Adobe Acrobat, Photoshop.....to "map into memory" on i7 machines with RAID0 and/or SSD drives that's because of "data"?
There are many factors to performance. And mapping executables into memory, I'm sure, is by far not the most significant one of them.
as the compiler cannot deduce these things (except for simple scenarios like assuming all noreturn functions are cold)...and saying that we can/should then help it with BOOST_LIKELY while arguing that we shouldn't help it with BOOST_COLD/MINSIZE/OPTIMIZE_FOR_* is 'beyond self contradicting'...
The difference is the amount of effort you have to put into it and the resulting portability and effect.
Huh? The effort is less, portability the same (macros?) and the effect is better (simply because those are better tools for the job)??
Earlier I said that I don't think that the general OPTIMIZE_FOR_SPEED/ OPTIMIZE_FOR_SIZE/etc. macros will work for everyone and everywhere. And having to specify compiler options manually hampers portability and increases maintenance effort.
I am talking about debud builds in particular. If I build a debug binary, I want to be able to step through every piece of code, including the ones you marked for speed. If I build for binary size, I want to miminize size of all code, including the one you marked. I don't care for speed in either of these cases.
Debug builds are a red herring - per function attributes like hot, cold, optsize... do not affect debug builds or debug information.
That's unexpected. Not the debug information, but the ability to step in the debugger through code, with data introspection, is directly affected by optimization levels. I don't believe that somehow magically specifying -O3 in code would provide a better debuggable binary than specifying -O3 in the compiler command line.