Zach Laine
On Thu, Jun 18, 2015 at 4:27 PM, Louis Dionne
wrote: [...]
To get there, I'd like to make sure I understand exactly what operation you're trying to avoid. Let's assume you wrote the following instead of a transform_mutate equivalent:
hana::tuple
xs{...}; hana::tuple ys; ys = hana::transform(xs, f); This will first apply f() to each element of xs(), and then store the temporary values in a (temporary) tuple by moving them into place. This will then move-assign each element from the temporary tuple into ys.
__Is the first move what you are trying to avoid?__
No, I'm trying to get rid of the temporary altogether. We all know that copies of temporaries get RVO'd out of code like the above a lot of the time, *but not always*. I want a guarantee that I don't need to rely on RVO in a particular case, if efficiency is critical.
Sorry I'm being so slow, but do you mean get rid of the temporary tuple or the temporary value? Regarding the temporary value, I think there just isn't a way to get rid of it. When you write T y = f(x); there is a temporary object created by f(x) and then moved into y, right? Similarly, if you have T y; y = f(x); there's a temporary created by f(x) that gets move-assigned to y. In all cases, there's a temporary value created, and you're relying on the optimizer to elide it. Am I misunderstanding something fundamental about C++, or just being thick? I'll take it that you want to get rid of the temporary tuple. In this case, it is true that using a mutating algorithm will avoid the creation of a temporary tuple. To achieve this, I see three main solutions. The first one is to provide mutating algorithms. I don't like that, but it solves your problem. The second one is to provide lazy views a la Fusion that would compute the results on the fly. When you assign a view to a sequence, each element would be computed and then assigned directly, without creating a temporary tuple. I like this better, but it might have a non-trivial impact on the design of the library and it also represents a lot of work. The third one is to consider this as a corner case, pretend the optimizer does its job properly most of the time, and to let performance freaks write for_each(range(int_<0>, int_<n>), [&](auto i) { output[i] = f(input[i]); }); I'm not sure which one is the best resolution. Regards, Louis