On 27/01/2015 15:08, Niall Douglas wrote:
On 27 Jan 2015 at 10:58, Gavin Lambert wrote:
There are negative performance consequences to copying a shared_ptr (ie. incrementing or decrementing its refcount). *Most* applications don't need to care about this (it's very small) but sometimes it's worthy of note, and there's no harm in avoiding copies in silly places (which is why I thwack people that pass a shared_ptr as a value parameter).
As food for thought, AFIO which uses shared_ptr very heavily indeed to avoid any locking at all passes them around all by value. It was bugging me whether this was costing me performance, so I tried replaced the lot with reference semantics.
Total effect on performance: ~0.1%.
As I said, it's not a big difference (atomic ops are typically ~1us, and that was on the previous CPU generation), but it's still one of my pet peeves, as while there are many places where shared_ptrs do need to get copied for correctness, parameter passing is not one of those places. (And performance gets worse if you end up passing the object through many layers as part of keeping methods short or similar "tidiness" or abstraction guidelines; and it wastes more stack too.) You're going to have to make lots of copies anyway in an asynchronous library like AFIO, because binding an asynchronous callback is one of those places that you *do* need to copy a shared_ptr, so if you have a high percentage of async code (which is what I would expect with that sort of library) then it's not going to make much difference either way.
The key is that AFIO very, very rarely has more than one thread touch a shared_ptr at once. That, on Intel at least, makes their atomic reference counting almost as cheap as non-atomic reference counting. Combine that with the compiler judiciously folding out copies for you where it can, and the overhead for the benefits to debugging and maintenance is irrelevant.
Writing a single shared_ptr instance from multiple threads requires even more overhead from the extra spinlock (via the atomic_*(&sp...) family of functions). Though an uncontended spinlock basically only costs 2 atomic-ops, so it's usually not too bad. (But those functions do mildly irritate me in that they're also passing by value, but at least in that case they're inlined template methods so the compiler will almost certainly elide the parameter copy. Another case where generic library code may "win" over application code.) Multi-writers is one case where it may be better to create separate per-thread copies from some "safe" context up front, if you can (assuming you're ok with operating on stale data until some sync point). But again, to a certain extent async code patterns may already be doing these copies "for you". And if you're limiting yourself to WORM access only, you can skip the spinlock if you're careful.
Of course, I'm currently seeing a 300k CPU cycle per op average. shared_ptr is tiny compared to that. With a 10k CPU cycle per op average I might care a bit more.
I'm probably biased the other way, because about half of the code I work on has sub-millisecond budgets. :)