On 6/27/15 10:48 PM, Louis Dionne wrote:
Joel de Guzman
writes: [...] Your tests focused on a mere subset of Fusion which is the least optimized: fusion::vector. Your tests do not cover other areas such as views, and other containers.
Frankly, I thought fusion::vector would be the most efficient container. Isn't vector documented as such? Quoting from the documentation:
vector is the simplest of the Fusion sequence container (a vector with N elements is just a struct with N members), and in many cases the most efficient.
That was true a decade ago in a purely c++03 world. The C++ world has changed since then and fusion has evolved since then. I'm sorry for the confusion. Anyway, Fusion is undergoing transformation again following advances in C++. The documentation will have to be updated. Anyway, it's not just containers I am referring to. And you know now that I am referring to views, and on runtime performance.
I can imagine a case, when say you push_back or insert where views should shine. It's the difference between copy by value and pass by reference. Try doing that with large tuple elements (those without resources and are not efficiently moved). Actually, have you tried reverse with large tuples like bitmaps and stuff?
Hmmmm, how did you test reverse in Fusion BTW? Fusion reverse should return a view with ZERO(!) copy. Did you create another vector from the reverse_view? I guess so, since your test subject is Fusion vector. In that case, you are not doing it right! You don't copy to another vector. You should use the results as-is. Lazily.
For that matter, the same is true with transform! Fusion returns transform_view. You should get zero overhead with those because they are lazy. If you are copying back to a fusion vector, then you are not doing it right. Have you tried accessing, say only the Nth element instead of (presumably) copying back to a vector before accessing the Nth element? or how about transforming only a range within the container?
It seems you are doing your tests unfairly using eager evaluation (Hana) instead of lazy evaluation (Fusion).
There seems to be a larger issue here. That issue is: How to benchmark lazy computations and eager computations? To illustrate the problem, consider the following "benchmark":
transforming a vector in C++:
#include <algorithm> #include <vector>
int main() { std::vector<int> xs(1000); std::vector<int> ys; ys.reserve(xs.size()); std::transform(xs.begin(), xs.end(), std::back_inserter(ys), [](int x) { return x + 1; }); }
transforming a list in Haskell:
import Data.List
main = do let xs = take 1000 (repeat 0) let ys = fmap (+1) xs return ()
Written this way, the Haskell code will always be faster because it does not do anything, since it is lazy. However, that does not tell us much about what we're interested in. What we'd like to know is how the above code behaves when I _actually_ access the elements of the lazy list. So, of course I copy the result into vectors when benchmarking Fusion algorithms, since otherwise there's nothing to benchmark!
And so you are doing it wrong! What you are getting is the extra construction. The result *IS* already a sequence that can be traversed. That extra copy is not necessary. If you've traversed over the elements of the sequence instead, you should see a gain with Fusion because Hana will have to create a new tuple while Fusion will not.
I think this ought to be explained in the documentation, but I don't think it invalidates Hana's claims. For equivalent functionality (i.e. when accessing all the members of the resulting container, which is assumed to usually be the case), Hana will perform as good as Fusion.
Try getting a single element, or a range, for example and, how about trying multiple concatenated algorithms, or how about push_back and insert multiple times? I can think of a LOT of examples where views will blow eager evaluation out the water. I'm sure you've seen Eric's range V3 presentation? Imagine doing all that computation eagerly and you'll know what I mean! Bottom line, what you did in your tests is force an apple to be an orange and test it against an orange. That said, I'm still surprised that Fusion withstood the tests and performed admirably despite the added construction step. The test on reverse deserves investigation and optimization though, so I'll take the liberty of using your words, changing "Hana" to "Fusion": Performance is important to us: if you ever encounter a scenario where Fusion causes bad code to be generated (and the fault is not on the compiler), please open an issue so the problem can be addressed. Regards, -- Joel de Guzman http://www.ciere.com http://boost-spirit.com http://www.cycfi.com/