Here's my review. 1. What is your evaluation of the design? The design of building on OpenCL seems sound, for the reasons Kyle has explained. The fact that it has real users is very encouraging. I don't really like the name (I was unable to guess what the library was about from the name), but it's not worth changing at this point. I am worried about the compile-time configuration with macros. I would hope that BOOST_COMPUTE_HAVE_THREAD_LOCAL could be read from Boost.Config. For the two features which require linking against other Boost libraries, my instinct would be to have them on by default, rather than off by default (that should be easier if and when Boost.Compute is part of the Boost distribution). It would be very nice to have range versions of all the algorithms. I find myself using the boost::range algorithms much more than the std:: ones these days. Similarly, it would be good to provide the safer 4-iterator versions of equal, is_permutation, mismatch (and any I've forgotten) as were added to C++14. Some minor things: - Consider renaming boost::compute::lambda to be consistent with one of the other placeholder-containing namespaces from std or Boost. (You can keep the old name as an alias for backwards-compatibility). - Why the default of .25 as the bernoulli_distribution parameter? - Did you consider using Boost.Predef for your version macro? 2. What is your evaluation of the implementation? Only looked at a couple of things while trying to understand the docs. 3. What is your evaluation of the documentation? Overall, not too bad. More tutorial would be good. In particular, something using some of the standard algorithms. Some concerns: - All the header links off http://kylelutz.github.io/compute/boost_compute/reference.html are broken. - All or most of the things in "See also" sections look like they ought to be links but aren't. - For all your function parameter documentation, the list of parameters seems to be sorted alphabetically by parameter name rather than in the order they are passed to the function. This is really weird... - As others have mentioned, some kind of complexity estimate or promise for the algorithms is very important. In particular it would be good to understand how much parallelism they take advantage of and how they ought to scale with device compute capability. - The warning about accumulate being slow for floats should be much more prominent. - http://kylelutz.github.io/compute/boost/compute/copy.html mentions copying from e.g. a std::list to a compute::vector. It would be nice if it gave some kind of indication as to how efficient we can expect such an operation to be. In particular, I guess it ought to do some kind of buffering on the host? - The way I normally copy into vectors is call v.reserve(...) and then copy to std::back_inserter(v). I tried that with a compute::vector and unsurprisingly it was very inefficient. I see you already warned about this in the push_back docs, but perhaps it should also be highlighted in the documentation of compute::copy. - Is there no version of inclusive/exclusive_scan or iota using an arbitrary associative operator? - http://kylelutz.github.io/compute/boost/compute/includes.html it's unclear whether the subrange has to be contiguous. - min_element and max_element should cross-reference minmax_element. - What source of randomness does random_shuffle use? - Why does reduce return its result through an OutputIterator? - http://kylelutz.github.io/compute/boost/compute/remove.html Does this really remove things or use std::remove semantics? You should clarify and document the return value. (Same with remove_if, unique) - I can see how sort_by_key might be useful, and I'm slightly surprised that there's no stable_sort_by_key (which seems like it would be more useful than stable_sort). Any particular reason? - The boost/compute/types headers docs seem to be broken or missing. In particular I couldn't find reference documentation for float4_ et al. - I was interested to see allocators, but the docs seem uninformative: http://kylelutz.github.io/compute/boost/compute/pinned_allocator.html - Given the confusion with std::future, you might want to clarify whether compute::future's destructor blocks. - Also, http://kylelutz.github.io/compute/BOOST_COMPUTE_CLOSURE.html doesn't explain what format the lists (arguments and capture) should be in. Preprocessor functions often take lists in other formats, such as a Boost.PP Sequence http://www.boost.org/doc/libs/1_57_0/libs/preprocessor/doc/data/sequences.ht..., and your example only has one-element lists, so it's not enough to clarify this point. (To be clear: I'm not suggesting you should use Boost.PP sequences, merely that you should make it clear in the docs whether you are). - http://kylelutz.github.io/compute/BOOST_COMPUTE_CL_CALLBACK.html is empty, as is http://kylelutz.github.io/compute/BOOST_COMPUTE_MAX_ARITY.html - http://kylelutz.github.io/compute/boost/compute/valarray.html - presumably compute::valarray has overloaded arithmetic operators? Are they mentioned anywhere? - http://kylelutz.github.io/compute/boost/compute/vector.html the unusual behaviour of vector's size_t constructor (not initializing the elements) should be warned more prominently. - http://kylelutz.github.io/compute/boost/compute/device.html the documentation of the enum type in the synopsis seems to be messed up. Similarly memory_object. - http://kylelutz.github.io/compute/boost/compute/function.html I'm puzzled by this interface. What's the name for? Would one ever use this constructor rather than e.g. make_function_from_source? - http://kylelutz.github.io/compute/boost/compute/field.html My reading of this is that when you get the type wrong you are guaranteed some kind of meaningful error report, and not just undefined behaviour; is that right? Perhaps you should clarify. - I can't guess what atomic_min and atomic_max do. - The right arrow ('next') link from http://kylelutz.github.io/compute/boost/compute/atomic_xor.html is broken (looks like the docs for the placeholders don't work). - The iterators should state what iterator category they have. - I can't guess what svm_ptr is for. Later I discovered svm_alloc and svm_free. Is there an RAII way to manage these things? - What does default_device do if there are no devices? Do the various environment variables mentioned have any kind of precedence? (Probably linking to the corresponding OpenCL docs is sufficient for such questions). Do they have to be exact matches? Are they case-sensitive? - What does find_device do if the requested device does not exist? - Can you give examples for type_definition? My guess was that it did the thing which type_name later turned out to do... - BOOST_COMPUTE_ADAPT_STRUCT should state that it must be used at global scope (i.e. not within a function or other namespace). - Congratulations for getting compile-time detection of internal padding into BOOST_COMPUTE_ADAPT_STRUCT :). You might want to mention in the docs that this will be detected, so people don't worry too much. - It looks like the performance tests don't include the time for copying data to (and, where applicable, from) the device. You should mention this on the Performance page. - You should probably add C++AMP to your things compared with in the FAQ. 4. What is your evaluation of the potential usefulness of the library? High. I've wanted something like this in the past and not found this. 5. Did you try to use the library? With what compiler? Did you have any problems? I used it with g++ 4.8.2 in C++1y mode. Some quick tests seemed to work OK, and indeed I was able to sort some large vectors of scalars faster using boost::compute::sort than std::sort, which is nice. I was pleased to see that 64-bit integers were handled fine. 6. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study? Couple of hours reading and trying simple examples. 7. Are you knowledgeable about the problem domain? I have done a couple of small projects with CUDA, and am generally knowledgeable, but with no previous OpenCL experience. 8. Do you think the library should be accepted as a Boost library? Be sure to say this explicitly so that your other comments don't obscure your overall opinion. Yes. OpenCL seems to have fairly broad acceptance, but is rather difficult to use safely and with C++-isms. This library fills a useful niche. In an ideal world it will eventually be rendered obsolete by things like parallelism entering the standard library, but I doubt that will happen soon if at all. This provides a good solution now, and we shouldn't dismiss it simply because something sparkly is approaching, which may or may not turn out to be awesome and widely adopted. This library has clearly been a lot of work. Congratulations on making it this far, I hope the review is not too stressful :). John Bytheway