[boost] [compute] Review

29 Dec 2014

      Here's my review.

1. What is your evaluation of the design?

The design of building on OpenCL seems sound, for the reasons Kyle has
explained.  The fact that it has real users is very encouraging.

I don't really like the name (I was unable to guess what the library was
about from the name), but it's not worth changing at this point.

I am worried about the compile-time configuration with macros.  I would
hope that BOOST_COMPUTE_HAVE_THREAD_LOCAL could be read from
Boost.Config.  For the two features which require linking against other
Boost libraries, my instinct would be to have them on by default, rather
than off by default (that should be easier if and when Boost.Compute is
part of the Boost distribution).

It would be very nice to have range versions of all the algorithms.  I
find myself using the boost::range algorithms much more than the std::
ones these days.

Similarly, it would be good to provide the safer 4-iterator versions of
equal, is_permutation, mismatch (and any I've forgotten) as were added
to C++14.

Some minor things:

- Consider renaming boost::compute::lambda to be consistent with one of
the other placeholder-containing namespaces from std or Boost.  (You can
keep the old name as an alias for backwards-compatibility).

- Why the default of .25 as the bernoulli_distribution parameter?

- Did you consider using Boost.Predef for your version macro?

2. What is your evaluation of the implementation?

Only looked at a couple of things while trying to understand the docs.

3. What is your evaluation of the documentation?

Overall, not too bad.

More tutorial would be good.  In particular, something using some of the
standard algorithms.

Some concerns:

- All the header links off
<http://kylelutz.github.io/compute/boost_compute/reference.html> are broken.

- All or most of the things in "See also" sections look like they ought
to be links but aren't.

- For all your function parameter documentation, the list of parameters
seems to be sorted alphabetically by parameter name rather than in the
order they are passed to the function.  This is really weird...

- As others have mentioned, some kind of complexity estimate or promise
for the algorithms is very important.  In particular it would be good to
understand how much parallelism they take advantage of and how they
ought to scale with device compute capability.

- The warning about accumulate being slow for floats should be much more
prominent.

- <http://kylelutz.github.io/compute/boost/compute/copy.html> mentions
copying from e.g. a std::list to a compute::vector.  It would be nice if
it gave some kind of indication as to how efficient we can expect such
an operation to be.  In particular, I guess it ought to do some kind of
buffering on the host?

- The way I normally copy into vectors is call v.reserve(...) and then
copy to std::back_inserter(v).  I tried that with a compute::vector and
unsurprisingly it was very inefficient.  I see you already warned about
this in the push_back docs, but perhaps it should also be highlighted in
the documentation of compute::copy.

- Is there no version of inclusive/exclusive_scan or iota using an
arbitrary associative operator?

- <http://kylelutz.github.io/compute/boost/compute/includes.html> it's
unclear whether the subrange has to be contiguous.

- min_element and max_element should cross-reference minmax_element.

- What source of randomness does random_shuffle use?

- Why does reduce return its result through an OutputIterator?

- <http://kylelutz.github.io/compute/boost/compute/remove.html> Does
this really remove things or use std::remove semantics?  You should
clarify and document the return value.  (Same with remove_if, unique)

- I can see how sort_by_key might be useful, and I'm slightly surprised
that there's no stable_sort_by_key (which seems like it would be more
useful than stable_sort).  Any particular reason?

- The boost/compute/types headers docs seem to be broken or missing.  In
particular I couldn't find reference documentation for float4_ et al.

- I was interested to see allocators, but the docs seem uninformative:
<http://kylelutz.github.io/compute/boost/compute/pinned_allocator.html>

- Given the confusion with std::future, you might want to clarify
whether compute::future's destructor blocks.

- Also, <http://kylelutz.github.io/compute/BOOST_COMPUTE_CLOSURE.html>
doesn't explain what format the lists (arguments and capture) should be
in.  Preprocessor functions often take lists in other formats, such as a
Boost.PP Sequence
<http://www.boost.org/doc/libs/1_57_0/libs/preprocessor/doc/data/sequences.html>,
and your example only has one-element lists, so it's not enough to
clarify this point.  (To be clear: I'm not suggesting you should use
Boost.PP sequences, merely that you should make it clear in the docs
whether you are).

- <http://kylelutz.github.io/compute/BOOST_COMPUTE_CL_CALLBACK.html> is
empty, as is
<http://kylelutz.github.io/compute/BOOST_COMPUTE_MAX_ARITY.html>

- <http://kylelutz.github.io/compute/boost/compute/valarray.html> -
presumably compute::valarray has overloaded arithmetic operators?  Are
they mentioned anywhere?

- <http://kylelutz.github.io/compute/boost/compute/vector.html> the
unusual behaviour of vector's size_t constructor (not initializing the
elements) should be warned more prominently.

- <http://kylelutz.github.io/compute/boost/compute/device.html> the
documentation of the enum type in the synopsis seems to be messed up.
Similarly memory_object.

- <http://kylelutz.github.io/compute/boost/compute/function.html> I'm
puzzled by this interface.  What's the name for?  Would one ever use
this constructor rather than e.g. make_function_from_source?

- <http://kylelutz.github.io/compute/boost/compute/field.html> My
reading of this is that when you get the type wrong you are guaranteed
some kind of meaningful error report, and not just undefined behaviour;
is that right?  Perhaps you should clarify.

- I can't guess what atomic_min and atomic_max do.

- The right arrow ('next') link from
<http://kylelutz.github.io/compute/boost/compute/atomic_xor.html> is
broken (looks like the docs for the placeholders don't work).

- The iterators should state what iterator category they have.

- I can't guess what svm_ptr is for.  Later I discovered svm_alloc and
svm_free.  Is there an RAII way to manage these things?

- What does default_device do if there are no devices?  Do the various
environment variables mentioned have any kind of precedence?  (Probably
linking to the corresponding OpenCL docs is sufficient for such
questions).  Do they have to be exact matches?  Are they case-sensitive?

- What does find_device do if the requested device does not exist?

- Can you give examples for type_definition?  My guess was that it did
the thing which type_name later turned out to do...

- BOOST_COMPUTE_ADAPT_STRUCT should state that it must be used at global
scope (i.e. not within a function or other namespace).

- Congratulations for getting compile-time detection of internal padding
into BOOST_COMPUTE_ADAPT_STRUCT :).  You might want to mention in the
docs that this will be detected, so people don't worry too much.

- It looks like the performance tests don't include the time for copying
data to (and, where applicable, from) the device.  You should mention
this on the Performance page.

- You should probably add C++AMP to your things compared with in the FAQ.

4. What is your evaluation of the potential usefulness of the
library?

High.  I've wanted something like this in the past and not found this.

5. Did you try to use the library? With what compiler? Did you have
any problems?

I used it with g++ 4.8.2 in C++1y mode.  Some quick tests seemed to work
OK, and indeed I was able to sort some large vectors of scalars faster
using boost::compute::sort than std::sort, which is nice.

I was pleased to see that 64-bit integers were handled fine.

6. How much effort did you put into your evaluation? A glance? A
quick reading? In-depth study?

Couple of hours reading and trying simple examples.

7. Are you knowledgeable about the problem domain?

I have done a couple of small projects with CUDA, and am generally
knowledgeable, but with no previous OpenCL experience.

8. Do you think the library should be accepted as a Boost library? Be
sure to say this explicitly so that your other comments don't obscure
your overall opinion.

Yes.  OpenCL seems to have fairly broad acceptance, but is rather
difficult to use safely and with C++-isms.  This library fills a useful
niche.  In an ideal world it will eventually be rendered obsolete by
things like parallelism entering the standard library, but I doubt that
will happen soon if at all.  This provides a good solution now, and we
shouldn't dismiss it simply because something sparkly is approaching,
which may or may not turn out to be awesome and widely adopted.

This library has clearly been a lot of work.  Congratulations on making
it this far, I hope the review is not too stressful :).

John Bytheway