[operators] What is the correct overload set for a binary operator?

Hi, after some more experiments, I convinced myself that I should explore the option of returning an rvalue (not an rvalue reference) in all cases when implementing operator+ based on operator+= (as an example). But I wonder what the most efficient overload set would look like. With this post I'll concentrate on same-type operations (T+T). Given a class T with operator+= like this: T& T::operator+=( const T& ); T& T::operator+=( T&& ); // if useful the class provides it and it should be used if applicable I think the following is currently the best overload set for operators: T operator+( T lhs, const T& rhs ) { lhs += rhs; return lhs; // no std::move needed } T operator+( T&& lhs, T&& rhs ) { lhs += std::move( rhs ); return std::move( lhs ); } T operator+( const T& lhs, T&& rhs ) { #ifdef COMMUTATIVE rhs += lhs; return std::move( rhs ); #else T nrv( lhs ); nrv += std::move( rhs ); return nrv; #endif } I tested this with the following expressions with both COMMUTATIVE defined and not defined: T a, b, c; // initialized to something T r1 = a + b + c; T r2 = a + ( b + c ); T r3 = ( a + b ) + ( b + c ); and with both GCC 4.7.3 and Clang 3.2. Could someone please test this with VC++? (complete code below) Generally: Any thoughts? Examples of expressions where and how it could be improved? Theoretical improvements are also welcome, but please mark them as such and always test your code to prevent pessimizations in practice! One remarks about the proposed full version of operators2.hpp: Andrew, I really appreciate your work and enthusiasm, but please understand that IMHO this needs to be done step-by-step and much more slowly than you probably hope for. Let's figure out the proper overload set for T+T first, then T+U/U+T (again both commutative and non-commutative), noexpect-specifier for all overloads, etc. After that, we need to decide on whether or not to improve the existing operator library or to provide an independent V2. This might include rethinking the grouped templates. Using Boost.Move might help with the implementation or not, depending on what overload set we want/need. It all needs tests and documentation and it will be quite some work… hope you and others are still with me on this :) Best regards, Daniel Here's the complete test code I used: #include <iostream> #include <utility> struct MyClass { MyClass() { std::cout << "MyClass::MyClass()" << std::endl; } MyClass( const MyClass& ) { std::cout << "MyClass::MyClass( const MyClass& ) COPY" << std::endl; } MyClass( MyClass&& ) { std::cout << "MyClass::MyClass( MyClass&& ) move" << std::endl; } ~MyClass() { std::cout << "MyClass::~MyClass()" << std::endl; } MyClass& operator+=( const MyClass& ) { std::cout << "MyClass::operator+=( const MyClass& ) +=" << std::endl; return *this; } MyClass& operator+=( MyClass&& ) { std::cout << "MyClass::operator+=( MyClass&& ) += &&" << std::endl; return *this; } }; MyClass operator+( MyClass lhs, const MyClass& rhs ) { lhs += rhs; return lhs; } MyClass operator+( MyClass&& lhs, MyClass&& rhs ) { lhs += std::move( rhs ); return std::move( lhs ); } // #define COMMUTATIVE MyClass operator+( const MyClass& lhs, MyClass&& rhs ) { #ifdef COMMUTATIVE rhs += lhs; return std::move( rhs ); #else MyClass nrv( lhs ); nrv += std::move( rhs ); return nrv; #endif } int main() { MyClass a, b, c; std::cout << "--- r1" << std::endl; MyClass r1 = a + b + c; std::cout << "--- r2" << std::endl; MyClass r2 = a + ( b + c ); std::cout << "--- r3" << std::endl; MyClass r3 = ( a + b ) + ( b + c ); std::cout << "--- done" << std::endl; }

On Sun, 28 Apr 2013, Daniel Frey wrote:
after some more experiments, I convinced myself that I should explore the option of returning an rvalue (not an rvalue reference) in all cases when implementing operator+ based on operator+= (as an example). But I wonder what the most efficient overload set would look like. With this post I'll concentrate on same-type operations (T+T).
Given a class T with operator+= like this:
T& T::operator+=( const T& ); T& T::operator+=( T&& ); // if useful the class provides it and it should be used if applicable
I think the following is currently the best overload set for operators:
T operator+( T lhs, const T& rhs ) { lhs += rhs; return lhs; // no std::move needed }
Splitting this into const& and && overloads of lhs would save a move for: T r = std::move(a) + b; (passing an xvalue and not a prvalue) In practice, it would also save a move for T r=a+b, although in theory it shouldn't (it should cost one extra move instead), but compilers are bad at eliding the copy between an argument and the return (it would be a cross-function optimization in the front-end). Sometimes you would want to overload on lvalue vs xvalue vs prvalue for optimal results, but the language will not allow it.
T operator+( T&& lhs, T&& rhs ) { lhs += std::move( rhs ); return std::move( lhs ); }
T operator+( const T& lhs, T&& rhs ) { #ifdef COMMUTATIVE rhs += lhs; return std::move( rhs ); #else T nrv( lhs ); nrv += std::move( rhs ); return nrv; #endif }
-- Marc Glisse

On Sun, 28 Apr 2013, Marc Glisse wrote:
On Sun, 28 Apr 2013, Daniel Frey wrote:
after some more experiments, I convinced myself that I should explore the option of returning an rvalue (not an rvalue reference) in all cases when implementing operator+ based on operator+= (as an example). But I wonder what the most efficient overload set would look like. With this post I'll concentrate on same-type operations (T+T).
Given a class T with operator+= like this:
T& T::operator+=( const T& ); T& T::operator+=( T&& ); // if useful the class provides it and it should be used if applicable
I think the following is currently the best overload set for operators:
T operator+( T lhs, const T& rhs ) { lhs += rhs; return lhs; // no std::move needed }
Splitting this into const& and && overloads of lhs would save a move for: T r = std::move(a) + b; (passing an xvalue and not a prvalue)
In practice, it would also save a move for T r=a+b, although in theory it shouldn't (it should cost one extra move instead),
Forgot to mention that this extra move would be for a+b+c, not a+b, sorry.
but compilers are bad at eliding the copy between an argument and the return (it would be a cross-function optimization in the front-end).
Sometimes you would want to overload on lvalue vs xvalue vs prvalue for optimal results, but the language will not allow it.
-- Marc Glisse

On 28.04.2013, at 17:52, Marc Glisse
Splitting this into const& and && overloads of lhs would save a move for: T r = std::move(a) + b; (passing an xvalue and not a prvalue)
Thanks Marc, it seems that we still need the usual 4 overloads on const lvalue reference and rvalue reference for the parameters and just return an rvalue in all cases. With todays compilers I fail to find a case for pass-by-value. Currently, this works best for me: T operator+( const T& lhs, const T& rhs ) { T nrv( lhs ); nrv += rhs; return nrv; } T operator+( const T& lhs, T&& rhs ) { #ifdef COMMUTATIVE rhs += lhs; return std::move( rhs ); #else T nrv( lhs ); nrv += std::move( rhs ); return nrv; #endif } T operator+( T&& lhs, const T& rhs ) { lhs += rhs; return std::move( lhs ); } T operator+( T&& lhs, T&& rhs ) { lhs += std::move( rhs ); return std::move( lhs ); } I'll write a larger test program when I have some more time which allows better comparison of different overload sets and we can then try to come up with expressions to show the benefits and the drawbacks of each technique, including if possible cases that benefit from pass-by-value. BR, Daniel

One remarks about the proposed full version of operators2.hpp: Andrew, I really appreciate your work and enthusiasm, but please understand that IMHO this needs to be done step-by- step and much more slowly than you probably hope for. Let's figure out the proper overload set for T+T first, then T+U/U+T (again both commutative and non-commutative), noexpect-specifier for all overloads, etc. After that, we need to decide on whether or not to improve the existing operator library or to
Daniel Frey
might include rethinking the grouped templates. Using Boost.Move might help with the implementation or not, depending on what overload set we want/need. It all needs tests and documentation and it will be quite some work… hope you and others are still with me on this :)
That's fine, I was using that code as a test-bed to try and understand how different features would fit together, and it really wasn't much work once I had a cleaned-up base (I can re-use the same base with any BOOST_BINARY_OPERATOR macro definition quite easily, some other changes may be relatively easy to scale). I figured I might as well share my results to demonstrate my findings (for example, I was curious as to how Boost Move would fit in).
Could someone please test this with VC++?
Yeah, I did some testing with VS2012 (/O2 optimizations) using your original set and with Marc's set. Both work, but Marc's set does result in fewer temporaries all-together. To be specific, all internal moves seem to be optimized away, only move required is moving into the result variable. I think this provides all of the benefits of your previous rvalue-ref code without the unsafe behavior. I got the same results testing with MinGW (GCC 4.7.2). One interesting behavior I noticed while testing: 1) MyClass operator+( const MyClass& lhs, const MyClass& rhs ) { MyClass nrv(lhs); nrv += rhs; return nrv; } 2) MyClass operator+( const MyClass& lhs, const MyClass& rhs ) { return std::move(MyClass(lhs) += rhs); } 1) and 2) do not compile equivalently. 2) for some reason does not allow the compiler to perform return value optimizations (at least the compilers I tested with), potentially resulting in extra unnecessary temporaries.

On 28.04.2013, at 21:47, Andrew Ho
Yeah, I did some testing with VS2012 (/O2 optimizations) using your original set and with Marc's set. Both work, but Marc's set does result in fewer
Which set is "Mark's set"? The last one I posted ~100 minutes ago using the signatures mentioned by Mark and including an implementation?
temporaries all-together. To be specific, all internal moves seem to be optimized away, only move required is moving into the result variable. I think this provides all of the benefits of your previous rvalue-ref code without the unsafe behavior.
It's still slower depending on the cost of the move, only for some classes the compiler can optimize that cost away. Anyways, let's explore the best "return rvalues" version. I think the last version I posted should be really close, except if someone can come up with a pass-by-value version and expressions where it works better in practice. I think that for now we might start with the code I posted and if future compilers are able to support and optimize pass-by-value to allow even better code, we might detect these compilers and use a different implementation.

Which set is "Mark's set"? The last one I posted ~100 minutes ago using
the signatures mentioned by Mark and
including an implementation?
It's still slower depending on the cost of the move, only for some classes
Yeah, when I did the tests your post wasn't up yet, but we have essentially the exact same implementation of Mark's idea. the compiler can optimize that
cost away.
I take it you're comparing the return rvalue vs. return rvalue ref? If so, quite possibly yes. I'll see if I can't identify under what practical situations this solution performs worse compared to returning r-value refs.
Anyways, let's explore the best "return rvalues" version. I think the last version I posted should be really close, except if someone can come up with a pass-by-value version and expressions where it works better in practice.
I think that for now we might start with the code I posted and if future compilers are able to support and optimize pass-by-value to allow even better code, we might detect these compilers and use a different implementation.
Sounds good to me. FWIW, I extended this solution to the other cases. All other cases have 4 overloads except for commutative T + U which has 8. It should be a copy/paste replacement that should work with the operators2.hpp test-bed. Link: http://codepad.org/HdII3G0b

On Sun, 28 Apr 2013, Andrew Ho wrote:
Yeah, I did some testing with VS2012 (/O2 optimizations) using your original set and with Marc's set. Both work, but Marc's set does result in fewer temporaries all-together. To be specific, all internal moves seem to be optimized away, only move required is moving into the result variable. I think this provides all of the benefits of your previous rvalue-ref code without the unsafe behavior.
Er, no, there are many unnecessary moves in this version that would go away if we returned a reference.
One interesting behavior I noticed while testing:
1)
MyClass operator+( const MyClass& lhs, const MyClass& rhs ) { MyClass nrv(lhs); nrv += rhs; return nrv; }
2)
MyClass operator+( const MyClass& lhs, const MyClass& rhs ) { return std::move(MyClass(lhs) += rhs); }
1) and 2) do not compile equivalently. 2) for some reason does not allow the compiler to perform return value optimizations (at least the compilers I tested with), potentially resulting in extra unnecessary temporaries.
Copy elision is extremely limited. The committee basically wrote 3 examples and said it was allowed for all 3 and nothing else. One trick to help reason about it: std::move involves a reference, and the elision is for values only (the return value of += is also a reference). -- Marc Glisse

Er, no, there are many unnecessary moves in this version that would go away if we returned a reference.
Did some testing, yeah, you're right, though the original test cases presented by Daniel didn't cause them to show up in VS2012 or GCC 4.7.2. Additional test cases: T a, b, c, d; T r4 = a + b + c + d; // extra temporary created for + d if no return rvalue ref r4 = a + b + c; // extra temporary created for assignment operator if no return rvalue ref
participants (3)
-
Andrew Ho
-
Daniel Frey
-
Marc Glisse