Error handling benchmarks: LEAF, Boost Outcome
I wrote a benchmark program for LEAF and Boost Outcome. Here are the results: https://github.com/zajo/leaf/blob/master/benchmark/benchmark.md
On 27/11/2019 00:01, Emil Dotchevski via Boost wrote:
I wrote a benchmark program for LEAF and Boost Outcome. Here are the results:
https://github.com/zajo/leaf/blob/master/benchmark/benchmark.md
These results seem reasonable if you are calling 32 functions deep, and those functions do no work. Returning large sized objects where you cause RVO to be disabled is by definition a benchmark of memcpy(). In real world code, the C++ compiler works very hard to avoid calling deep stacks of small functions. It's rare in the real world. Returning large sized objects from functions also tends to be rare in the real world. Outcome has known poor codegen. I am working on a new implementation where if both T and E are trivially copyable or move relocatable, you get union-based storage and much improved codegen. This should narrow the gap considerably over the null case. But in truth, when I test the new implementation in an existing Outcome-based codebase, I find no statistically observable difference. If you're doing any real work at all, Outcome can be 10x less efficient and it gets lost by other more dominant work. I thank Emil for doing this work, and for sharing the results with me before he published them. I concur with the benchmarks observed for what was tested. Niall
On Wed, Nov 27, 2019 at 4:47 AM Niall Douglas via Boost < boost@lists.boost.org> wrote:
On 27/11/2019 00:01, Emil Dotchevski via Boost wrote:
I wrote a benchmark program for LEAF and Boost Outcome. Here are the results:
https://github.com/zajo/leaf/blob/master/benchmark/benchmark.md
These results seem reasonable if you are calling 32 functions deep, and those functions do no work.
Returning large sized objects where you cause RVO to be disabled is by definition a benchmark of memcpy().
In real world code, the C++ compiler works very hard to avoid calling deep stacks of small functions. It's rare in the real world. Returning large sized objects from functions also tends to be rare in the real
world. Interestingly, some of the feedback I got is that the call to rand() contaminates the results since it isn't free. I tend to agree with that, since the point of a benchmark is to amplify the impact of a system so its performance can be evaluated. Communicating large sized error objects does not cause RVO to be disabled with LEAF. It is designed with a strong bias towards the most common use case, where callers check for, but do not handle errors. If the caller is only going to check for failures and forward them to its caller, moving error objects one stack frame at a time adds overhead. Besides, even though large sized objects are not common, the need to communicate several error objects is. It makes no sense to try to bundle all that in a return value and hope for the best from the optimizer, given that most likely the immediate caller does not handle errors and therefore will not access anything other than the discriminant. To clarify, LEAF also needs to move error objects, including large sized error objects up the call chain, but they are moved only to (and between) error-handling stack frames, skipping all intermediate check-only levels. The benchmark is actually a bit unfair to LEAF in this regard, since the "handle some" case includes handling errors at every 4th function call, which is excessive in my experience (the "check-only" case does handle errors at the top). It is true that compilers avoid calling deep stacks of small functions, which is why the benchmark includes the inline vs. no_inline dimension. The simplicity of leaf::result<T> makes it extremely friendly to the optimizer, including when inlining is possible. I updated the benchmark paper to show generated code: https://github.com/zajo/leaf/blob/master/benchmark/benchmark.md#show-me-the-... .
But in truth, when I test the new implementation in an existing Outcome-based codebase, I find no statistically observable difference. If you're doing any real work at all, Outcome can be 10x less efficient and it gets lost by other more dominant work.
This is true for error handling in general. The most important function of an error handling library is to allow users to easily communicate any and all error objects to error handling contexts where they're needed. That it can be done efficiently is an added bonus. Emil
Hi,
Maybe I'm missing the point here, but this benchmark is using
outcome::std_outcome
On Wed, Nov 27, 2019 at 4:47 AM Niall Douglas via Boost < boost@lists.boost.org> wrote:
On 27/11/2019 00:01, Emil Dotchevski via Boost wrote:
I wrote a benchmark program for LEAF and Boost Outcome. Here are the results:
https://github.com/zajo/leaf/blob/master/benchmark/benchmark.md
These results seem reasonable if you are calling 32 functions deep, and those functions do no work.
Returning large sized objects where you cause RVO to be disabled is by definition a benchmark of memcpy().
In real world code, the C++ compiler works very hard to avoid calling deep stacks of small functions. It's rare in the real world. Returning large sized objects from functions also tends to be rare in the real
world.
Interestingly, some of the feedback I got is that the call to rand() contaminates the results since it isn't free. I tend to agree with that, since the point of a benchmark is to amplify the impact of a system so its performance can be evaluated.
Communicating large sized error objects does not cause RVO to be disabled with LEAF. It is designed with a strong bias towards the most common use case, where callers check for, but do not handle errors.
If the caller is only going to check for failures and forward them to its caller, moving error objects one stack frame at a time adds overhead. Besides, even though large sized objects are not common, the need to communicate several error objects is. It makes no sense to try to bundle all that in a return value and hope for the best from the optimizer, given that most likely the immediate caller does not handle errors and therefore will not access anything other than the discriminant.
To clarify, LEAF also needs to move error objects, including large sized error objects up the call chain, but they are moved only to (and between) error-handling stack frames, skipping all intermediate check-only levels. The benchmark is actually a bit unfair to LEAF in this regard, since the "handle some" case includes handling errors at every 4th function call, which is excessive in my experience (the "check-only" case does handle errors at the top).
It is true that compilers avoid calling deep stacks of small functions, which is why the benchmark includes the inline vs. no_inline dimension. The simplicity of leaf::result<T> makes it extremely friendly to the optimizer, including when inlining is possible. I updated the benchmark paper to show generated code: https://github.com/zajo/leaf/blob/master/benchmark/benchmark.md#show-me-the-... .
But in truth, when I test the new implementation in an existing Outcome-based codebase, I find no statistically observable difference. If you're doing any real work at all, Outcome can be 10x less efficient and it gets lost by other more dominant work.
This is true for error handling in general. The most important function of an error handling library is to allow users to easily communicate any and all error objects to error handling contexts where they're needed. That it can be done efficiently is an added bonus.
Emil
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
That's a good point. I further simplified the benchmark, added
outcome::result
Hi,
Maybe I'm missing the point here, but this benchmark is using outcome::std_outcome
in a no-exceptions scenario, in which case it will always carry around a std::exception_ptr for no good reason. This is, in fact, the main source of the slowdown between the two cases in the short error category. Applying the following patch to deep_stack_outcome.cpp
-#include
+#include #include template
-using result = outcome::std_outcome ; +using result = outcome::std_result ; and increasing the iteration count to 100000, to drown out the noise a little more, I get (with `clang++-9 -O3 -fno-exceptions -DNDEBUG`)
$ ./a.out 100000 iterations, call depth 32, sizeof(e_heavy_payload) = 4096 LEAF | | Function | Error | Elapsed Error type | At each level | inlining | rate | (μs) ----------------|--------------------|----------|-------|-------- e_error_code | LEAF_AUTO | Disabled | 2% | 14253 e_error_code | LEAF_AUTO | Enabled | 2% | 1826 e_error_code | try_handle_some | Disabled | 2% | 18333 e_error_code | try_handle_some | Enabled | 2% | 9499 e_error_code | LEAF_AUTO | Disabled | 50% | 14645 e_error_code | LEAF_AUTO | Enabled | 50% | 3179 e_error_code | try_handle_some | Disabled | 50% | 21092 e_error_code | try_handle_some | Enabled | 50% | 12741 e_error_code | LEAF_AUTO | Disabled | 98% | 14353 e_error_code | LEAF_AUTO | Enabled | 98% | 3345 e_error_code | try_handle_some | Disabled | 98% | 23413 e_error_code | try_handle_some | Enabled | 98% | 13574 ----------------|--------------------|----------|-------|-------- e_system_error | LEAF_AUTO | Disabled | 2% | 13347 e_system_error | LEAF_AUTO | Enabled | 2% | 2198 e_system_error | try_handle_some | Disabled | 2% | 18138 e_system_error | try_handle_some | Enabled | 2% | 11177 e_system_error | LEAF_AUTO | Disabled | 50% | 13788 e_system_error | LEAF_AUTO | Enabled | 50% | 3580 e_system_error | try_handle_some | Disabled | 50% | 22136 e_system_error | try_handle_some | Enabled | 50% | 14479 e_system_error | LEAF_AUTO | Disabled | 98% | 13178 e_system_error | LEAF_AUTO | Enabled | 98% | 3664 e_system_error | try_handle_some | Disabled | 98% | 23790 e_system_error | try_handle_some | Enabled | 98% | 15512 ----------------|--------------------|----------|-------|-------- e_heavy_payload | LEAF_AUTO | Disabled | 2% | 14862 e_heavy_payload | LEAF_AUTO | Enabled | 2% | 1941 e_heavy_payload | try_handle_some | Disabled | 2% | 18502 e_heavy_payload | try_handle_some | Enabled | 2% | 11352 e_heavy_payload | LEAF_AUTO | Disabled | 50% | 20644 e_heavy_payload | LEAF_AUTO | Enabled | 50% | 11350 e_heavy_payload | try_handle_some | Disabled | 50% | 27156 e_heavy_payload | try_handle_some | Enabled | 50% | 19173 e_heavy_payload | LEAF_AUTO | Disabled | 98% | 25337 e_heavy_payload | LEAF_AUTO | Enabled | 98% | 18961 e_heavy_payload | try_handle_some | Disabled | 98% | 34474 e_heavy_payload | try_handle_some | Enabled | 98% | 26510
$ ./a.out 100000 iterations, call depth 32, sizeof(e_heavy_payload) = 4096 Outcome | | Function | Error | Elapsed Error type | At each level | inlining | rate | (μs) ----------------|--------------------|----------|-------|-------- e_error_code | OUTCOME_TRY | Disabled | 2% | 9501 e_error_code | OUTCOME_TRY | Enabled | 2% | 890 e_error_code | Handle some errors | Disabled | 2% | 9873 e_error_code | Handle some errors | Enabled | 2% | 885 e_error_code | OUTCOME_TRY | Disabled | 50% | 9965 e_error_code | OUTCOME_TRY | Enabled | 50% | 2063 e_error_code | Handle some errors | Disabled | 50% | 11080 e_error_code | Handle some errors | Enabled | 50% | 2490 e_error_code | OUTCOME_TRY | Disabled | 98% | 7456 e_error_code | OUTCOME_TRY | Enabled | 98% | 1669 e_error_code | Handle some errors | Disabled | 98% | 10620 e_error_code | Handle some errors | Enabled | 98% | 2465 ----------------|--------------------|----------|-------|-------- e_system_error | OUTCOME_TRY | Disabled | 2% | 15769 e_system_error | OUTCOME_TRY | Enabled | 2% | 14504 e_system_error | Handle some errors | Disabled | 2% | 16339 e_system_error | Handle some errors | Enabled | 2% | 14581 e_system_error | OUTCOME_TRY | Disabled | 50% | 20917 e_system_error | OUTCOME_TRY | Enabled | 50% | 17556 e_system_error | Handle some errors | Disabled | 50% | 21019 e_system_error | Handle some errors | Enabled | 50% | 18024 e_system_error | OUTCOME_TRY | Disabled | 98% | 24544 e_system_error | OUTCOME_TRY | Enabled | 98% | 19690 e_system_error | Handle some errors | Disabled | 98% | 23214 e_system_error | Handle some errors | Enabled | 98% | 20386 ----------------|--------------------|----------|-------|-------- e_heavy_payload | OUTCOME_TRY | Disabled | 2% | 415474 e_heavy_payload | OUTCOME_TRY | Enabled | 2% | 125406 e_heavy_payload | Handle some errors | Disabled | 2% | 416026 e_heavy_payload | Handle some errors | Enabled | 2% | 126080 e_heavy_payload | OUTCOME_TRY | Disabled | 50% | 385746 e_heavy_payload | OUTCOME_TRY | Enabled | 50% | 142900 e_heavy_payload | Handle some errors | Disabled | 50% | 385494 e_heavy_payload | Handle some errors | Enabled | 50% | 140713 e_heavy_payload | OUTCOME_TRY | Disabled | 98% | 348241 e_heavy_payload | OUTCOME_TRY | Enabled | 98% | 157487 e_heavy_payload | Handle some errors | Disabled | 98% | 350570 e_heavy_payload | Handle some errors | Enabled | 98% | 156954
On Wed, Nov 27, 2019 at 9:55 PM Emil Dotchevski via Boost
wrote: On Wed, Nov 27, 2019 at 4:47 AM Niall Douglas via Boost < boost@lists.boost.org> wrote:
On 27/11/2019 00:01, Emil Dotchevski via Boost wrote:
I wrote a benchmark program for LEAF and Boost Outcome. Here are the results:
https://github.com/zajo/leaf/blob/master/benchmark/benchmark.md
These results seem reasonable if you are calling 32 functions deep, and those functions do no work.
Returning large sized objects where you cause RVO to be disabled is by definition a benchmark of memcpy().
In real world code, the C++ compiler works very hard to avoid calling deep stacks of small functions. It's rare in the real world. Returning large sized objects from functions also tends to be rare in the real
world.
Interestingly, some of the feedback I got is that the call to rand() contaminates the results since it isn't free. I tend to agree with that, since the point of a benchmark is to amplify the impact of a system so
performance can be evaluated.
Communicating large sized error objects does not cause RVO to be disabled with LEAF. It is designed with a strong bias towards the most common use case, where callers check for, but do not handle errors.
If the caller is only going to check for failures and forward them to its caller, moving error objects one stack frame at a time adds overhead. Besides, even though large sized objects are not common, the need to communicate several error objects is. It makes no sense to try to bundle all that in a return value and hope for the best from the optimizer, given that most likely the immediate caller does not handle errors and
its therefore
will not access anything other than the discriminant.
To clarify, LEAF also needs to move error objects, including large sized error objects up the call chain, but they are moved only to (and between) error-handling stack frames, skipping all intermediate check-only levels. The benchmark is actually a bit unfair to LEAF in this regard, since the "handle some" case includes handling errors at every 4th function call, which is excessive in my experience (the "check-only" case does handle errors at the top).
It is true that compilers avoid calling deep stacks of small functions, which is why the benchmark includes the inline vs. no_inline dimension. The simplicity of leaf::result<T> makes it extremely friendly to the optimizer, including when inlining is possible. I updated the benchmark paper to show generated code:
https://github.com/zajo/leaf/blob/master/benchmark/benchmark.md#show-me-the-...
.
But in truth, when I test the new implementation in an existing Outcome-based codebase, I find no statistically observable difference. If you're doing any real work at all, Outcome can be 10x less efficient and it gets lost by other more dominant work.
This is true for error handling in general. The most important function of an error handling library is to allow users to easily communicate any and all error objects to error handling contexts where they're needed. That it can be done efficiently is an added bonus.
Emil
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
participants (3)
-
Emil Dotchevski
-
Niall Douglas
-
Samuel Neves