19 May
2017
19 May
'17
3:54 p.m.
The hack is the above. We cache the address of the canonical singleton, and the noinline seems to cause the optimiser to disregard the thread fence and thus to not give up quickly. The resulting assembler generated is greatly improved on MSVC, a single result<T> shrinks from ~260 opcodes to less than 5.
As I use error categories extensively in my library, a blog post or article focusing on this technique would be useful (if it doesn't already exist).
It's a micro optimisation which only affects code bloat on one particular compiler currently.. I didn't, and still don't, think it worth writing up. Niall -- ned Productions Limited Consulting http://www.nedproductions.biz/ http://ie.linkedin.com/in/nialldouglas/