Re: [boost] [Boost] [Hana] Formal review for Hana

20 Jun 2015

      Zach Laine <whatwasthataddress <at> gmail.com> writes:
...
On Thu, Jun 18, 2015 at 4:27 PM, Louis Dionne <ldionne.2 <at> gmail.com> wrote:
...
[...]
To get there, I'd like to make sure I understand exactly what operation
you're trying to avoid. Let's assume you wrote the following instead of
a transform_mutate equivalent:
hana::tuple<T...> xs{...};
    hana::tuple<U...> ys;
    ys = hana::transform(xs, f);
This will first apply f() to each element of xs(), and then store the
temporary values in a (temporary) tuple by moving them into place. This
will then move-assign each element from the temporary tuple into ys.
__Is the first move what you are trying to avoid?__
No, I'm trying to get rid of the temporary altogether.  We all know that
copies of temporaries get RVO'd out of code like the above a lot of the
time, *but not always*.  I want a guarantee that I don't need to rely on
RVO in a particular case, if efficiency is critical.
Sorry I'm being so slow, but do you mean get rid of the temporary tuple or
the temporary value? Regarding the temporary value, I think there just isn't
a way to get rid of it. When you write

    T y = f(x);

there is a temporary object created by f(x) and then moved into y, right?
Similarly, if you have

    T y;
    y = f(x);

there's a temporary created by f(x) that gets move-assigned to y. In all
cases, there's a temporary value created, and you're relying on the optimizer
to elide it. Am I misunderstanding something fundamental about C++, or just
being thick?

I'll take it that you want to get rid of the temporary tuple. In this case, it
is true that using a mutating algorithm will avoid the creation of a temporary
tuple.

To achieve this, I see three main solutions. The first one is to provide
mutating algorithms. I don't like that, but it solves your problem.

The second one is to provide lazy views a la Fusion that would compute the
results on the fly. When you assign a view to a sequence, each element would
be computed and then assigned directly, without creating a temporary tuple.
I like this better, but it might have a non-trivial impact on the design of
the library and it also represents a lot of work.

The third one is to consider this as a corner case, pretend the optimizer
does its job properly most of the time, and to let performance freaks write

    for_each(range(int_<0>, int_<n>), [&](auto i) {
        output[i] = f(input[i]);
    });

I'm not sure which one is the best resolution.

Regards,
Louis