I think the test you presented is rather optimistic in that it is comprised of a single translation unit.
I deliberately did not use -fwhole-program. So the compiler only knew it was compiling to an executable not a shared library, but nothing more.
I think, in real applications the following are more common:
- The error category is often implemented in a separate translation unit from the code that sets or tests for error codes with that category. This follows from the existing practice of declaring the category instance as a function-local static, where the function is defined in a separate TU.
- The code that sets the error code is often in a separate TU than the code that tests for errors. This follows from the typical separation between a library and its users.
Given the above, unless LTO is used, I think the compiler will most often not be able to optimize the virtual function call.
As you noticed later in your reply, if a virtual function is marked final, the compiler inserts a check to see if the vtable's implementation matches and if so uses an inlined edition.
I've converted your code to a synthetic benchmark, consisting of one header and two translation units (one with the test itself and the other one that defines the error category). The test still does not isolate the code of producing the error code from the code that analyzes it, so in that regard it is still a bit optimistic.
I must protest at "optimistic". Your benchmark is actually the most pessimistic that it can be. You need to do some real work in there, let the out of order speculative execution do its thing. I usually throw in a FNV1a hash of 64 bytes, it's not much work, but is actual real work.
Do I think this overhead is significant enough? Difficult to tell. Certainly I'm not happy about it, but I could probably live with the 1.5x overhead. However, it still results in code bloat and there is no guarantee this optimization will be performed by the compiler (or that it will be effective if e.g. my code always overrides `error_category::failure`). Thing is, every bit of overhead makes me more and more likely consider dropping `error_code` in favor of direct use of error codes. `error_code` already carries additional baggage of the pointer to error category.
You might be interested to learn that in a real world code base I tested, padding error_code's size to 64 bytes produced no statistically observable slowdown on Intel Ivy Bridge. As soon as it tipped over 64 bytes though, it became very noticeable, about 5%. That was for error_code_extended in Outcome v1, I wanted to see how much payload I could cram in there. So when you say error_code comes with baggage over C error codes, I'll claim that you could in fact add lots more baggage still and probably see little effect. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/