Re: [boost] [Boost] [Hana] Announcing Hana's formal review next week (June 10th)

9 Jun 2015

      Oleg Grunin <ogrunin <at> yahoo.com> writes:
...
Glen Fernandes <glen.fernandes <at> gmail.com> writes:
...
Dear Boost community,
The formal review of Louis Dionne's Hana library starts Monday, 10th
June and ends on 24th June.
After playing for a few of hours with Hana I take my hat off to Louis
for the impressive feat he accomplished with the library and the
excellent documentation. MPL unlocked a whole new world of C++
programming that few, at the time, suspected existed. I can only hope
that Hana will do the same. But I am not yet certain of it. Maybe it is
due to the heavy paradigm shift which I haven't yet made. In MPL and
Fusion there was a clear distinction between the compile and run-time
worlds (In the former pretty much everything, save for_each, was
compile-time, and in latter all the compile time stuff lived in
result_of namespace ). With Hana that line is fuzzy. I understand that
Louis considers it its big achievement and it may be so but I am not yet
convinced. To me type manipulations (metaprogramming) were always
something that went on between the programmer and the compiler and were
entirely gone from the resulting binary.
They should be entirely gone from the resulting binary, and this is the
case with basically everything I've tried so far. The optimizer has all
the necessary information to remove this code entirely. The example you
are presenting is worthy of filing a bug against Clang, which I will do
once I'm done reducing the test case. See below for details.
...
Here is a small example that uses an mpl type map and the corresponding
code I put together (likely sub-optimally) using hana:
[...]
Indeed, you are using a Map when what you really want is a Tuple, since
your keys are integers. Here's the version I would have written:

    #include <boost/hana/integral_constant.hpp>
    #include <boost/hana/range.hpp>
    #include <boost/hana/tuple.hpp>
    using namespace boost::hana;

    static constexpr int map_size = 100;

    void run() {
      constexpr auto hana_range = range(int_<0>, int_<map_size>);
      auto hana_tuple = unpack(hana_range, [](auto ...n) {
        return tuple_t<std::array<char, decltype(n)::value>...>;
      });

      auto Array = hana_tuple[int_<69>];

      // instantiate the array
      constexpr decltype(Array)::type arr = {{0}};
      static_assert(sizeof(arr) == 69, "");
    }

However, I can easily imagine another example where you actually need the
functionality of a Map. See below for performance comments.
...
Despite Louis's claims of a significant compile-time speed up with Hana,
I get the following with clang 3.6.0:
$time clang++ -O3  -D_USE_HANA -stdlib=libc++ -std=c++14 hana.cpp  -o hana
real    0m55.233s
user    0m54.190s
sys     0m0.631s
$time clang++ -O3 -stdlib=libc++ -std=c++14 hana.cpp  -o mpl
real    0m2.162s
user    0m1.632s
sys     0m0.108s
There are several reasons for the above timings.
In order of importance:

1. Map and Set (the associative containers) are naively implemented right
   now. Optimizing and improving those data structures is the subject of
   my GSoC project of this summer, which I will focus on more seriously
   after the review (which is time consuming). I am aware of several
   possible optimizations that can be applied to heterogeneous
   associative structures, some of which would make your use case
   much more efficient.

   As you may have noticed, I never actually show benchmarks for
   associative data structures in the tutorial, but only for Tuple,
   because I know the performance of the associative containers to be
   bad (but not as bad as you shown, frankly). The compile-time
   performance claims made in the tutorial may be a bit bold for the
   time being, so I added a note about the associative data structures
   until they are optimized.

2. I think the compiler is incorrectly losing time for generating and
   optimizing code that should not be in the executable at all. Like I
   said, I consider such bad codegen to be a compiler bug.

3. You're using the wrong data structure for the task. This should
   have some impact on the performance, but never as much as here.

I've reimplemented your example in several different ways. You can find
the full code in the Gist at [1]. Here is a chart showing the compilation
time of the different solutions as a function of the number of
`std::array<char, n>`s in the container:

    http://i.imgur.com/gVDA632.png

As you can see, the implementations using Map + insert are really bad,
but the others are all OK. Also note how using decltype with Map + insert
cuts down the compilation time, which supports point (2) above.
...
And at run-time: (run a loop of 10000)
$ time ./hana 100000
real    0m2.834s
user    0m2.825s
sys     0m0.006s
$ time ./mpl 100000
real    0m0.021s
user    0m0.002s
sys     0m0.001s
Which clearly shows that the stuff intended to be purely compile-time
has leaked into the run-time.
There are several reasons for the above timings.
In order of importance (I think):

1. Clang generates horrible code for your example. I think the fault is
   principally on the compiler, since it generates perfect code for up
   to ~20 elements, and then starts emitting complete crap. I'll follow
   up with the Clang team about this.

2. The fault is also partially on the library, which does not compress
   the storage of empty types right now. Hence, your compile-time map,
   which contains (int_<...>, type<...>) pairs, actually has a huge
   sizeof, whereas it should have sizeof(char) since all those pairs
   are empty.

   Indeed, I was able to make the runtime performance issue go away by
   compressing the storage of empty types in Hana pairs. I opened an
   issue [2] to track the progress of this feature. By the way, I
   consider this a high priority feature.

Here is a chart showing the execution time for the examples in the Gist:

    http://i.imgur.com/CWUoDtB.png

Here is a chart showing the executable size for the examples in the Gist:

    http://i.imgur.com/uCThWXb.png

As you can see, only your original Map example is bad, and it starts
sudenly at about 30 elements, which reinforces my thinking that it's
a compiler bug or quality of implementation issue. As you can see, my
version using Hana Tuple is indeed 0 overhead, just like the MPL. I am
confident that compressing empty types and making the Map implementation
more clever should resolve the issue you presented.
...
Also it appears that Hana lets the compiler swallow all kinds of code
that doesn't appear to have any practical meaning:
I understand (maybe erroneously) that a hana::map can have a useful
application only if its keys are compile-time constructs, and yet I was
able to compile:
make_map(make_pair(1, "one"), make_pair(2, "two"));
You are correct when you state that Hana Maps require the keys to be
comparable at compile-time, i.e. the comparison of two keys must return
an IntegralConstant.

The above usage, which is generally meaningless, is impossible to catch
at compile-time in general, because the type of the object returned by
the comparison of 1 with something else may depend on that something
else. In other words, we can only flag the above usage if there exists
no 'x' (of any type) such that '1 == x' returns an IntegralConstant.
However there is no way to assess that in general.

However, if you tried to access an element of your map, then
you would see a compile-time failure:

    auto map = make_map(make_pair(1, "one"), make_pair(2, "two"));
    std::string one = map[1];

    // error:
    error: static_assert failed
        "hana::value<T>() requires T to be a Constant"
            static_assert(_models<Constant, C>{},
            ^             ~~~~~~~~~~~~~~~~~~~~~~
    note: in instantiation of function template specialization
            'boost::hana::value<bool>'
          requested here
                hana::if_(hana::value<decltype(
                                ^

As you can probably guess, this failure is caused by the fact that
the comparison of 1 and 1 returns a `bool`, yet Hana is expecting a
`Constant`.
...
and even
make_map(1,2,3);
This one is easy to catch, because make_map requires Products as
arguments. This was fixed and you should now get the following
pretty compile-time error:

    error: static_assert failed
    "hana::make<Map>(pairs...) requires all the 'pairs' to be Products"
                static_assert(hana::all(are_pairs),
                ^             ~~~~~~~~~~~~~~~~~~~~
    note: in instantiation of function template specialization
          'hana::make_impl<hana::Map, void>::apply<int, int, int>'
            requested here
                return make_impl<Datatype>::apply(static_cast<X&&>(x)...);
                                            ^
    note: in instantiation of function template specialization
          'hana::_make<hana::Map>::operator()<int, int, int>'
          requested here
    auto map = make_map(1,2,3);
                       ^

Thanks a lot for trying the library out and providing feedback. One
thing Hana sorely needs is to see more usage so that some non-obvious
problems can come out and be addressed.

Regards,
Louis

[1]: https://goo.gl/PdjhTb
[2]: https://github.com/ldionne/hana/issues/89