On Tue, Dec 30, 2014 at 7:15 AM, Vicente J. Botet Escriba
Hi,
I have no time to do a complete review. Anyway here it is my minimal review
*1. What is your evaluation of the design?** *** I would prefer that the algorithms take as first parameter the queue (which could be abstracted to an asynchronous executor).
Any particular reason to prefer the first argument rather than the last? Currently, by using the last argument to specify the command queue, we can use a default argument of "system::default_queue()". For the common case of only wanting to use the default compute device on the system, all algorithms will work correctly and the user never has to bother with finding a device, setting up a context, creating a command queue and passing it to the algorithm function.
The performances let think that there is no interest in using the library for containers having less than 10^4/10^5 elements.
In general yes, GPUs/Accelerators are ill-suited for small problem sizes (unless the computation performed on each element is quite expensive). Sorting 1000 integers will nearly always be faster on the host rather than the device.
Even if this is considered a low level library, the documentation should show how a library writer could make use of the provided interface to provide a higher one.
I agree, there should be more documentation for this. Could you let me know what sort of higher-level interface you'd like to see demonstrated?
I would prefer that the library make use of std::future or boost::future class instead of adding an additional one. Do the author tried to make use of the existing futures? if yes, which problems were found?
The implementation of the boost::compute::future type is fairly different than that of the std::/boost::future types. Essentially, there is no operation akin to "promise.set_value()" for boost::compute::future. Instead, it is a simple wrapper around an OpenCL event object which can be used to wait for an operation on the device to complete (via calling the blocking clWaitForEvents() function from the OpenCL API). Please let me know if there are ways to better integrate with boost::future.
I'm missing a range interface to most of the algorithms.
The algorithms API is meant to mimic the STL algorithms API. Adding a range-based interface is definitely possible, but is a non-trivial amount of work. Would you be interested in contributing this?
I suspect that as this is a C++03 library it doesn't take advantages of some interesting C++11+/C++14 features.
It does take advantage of some C++11 features (e.g. variadic functions) where possible. I would be very open to integrating more C++11/14 features where appropriate.
It is not clear which operations are synchronous, which ones could block, ...
Could you be more specific about what operations you are referring to? I know there is already some documentation regarding this for the command_queue class as well as for the copy()/copy_async() functions. I'll work on adding more.
In order to get the native OpenCl handle I suggest to name the function get_native_handle instead of get.
*2. What is your evaluation of the implementation?* I've not see at the implementation.
*3. What is your evaluation of the documentation?*
The documentation lacks of some core tutorials. I was unable to have an idea of what a context is, what can be done with a command_queue, what is a Kernel, a pipe, ...? Does it mean that the user must know OpenCL?
The low-level API is a wrapper around OpenCL and so having some background knowledge there helps. I'll work on adding more documentation for these classes and how they fit together.
The reference documentation should follow he C++ standard way, including pre-conditions, effects, synchronization, throws clauses.
Fully agree, I'll continue to work on this.
I would appreciate a description of the algorithms used, how the parallelism is used? is this completely managed by OpenCL or do the library something else?
I'm not sure if I fully understand what you're asking here but I'll try to answer it. OpenCL provides an API for executing code on compute devices (e.g. GPUs), Boost.Compute provides algorithms which, when called, use the OpenCL API to create programs, compile them for the device, and execute them. Let me know if I can better explain this.
*4. What is your evaluation of the potential usefulness of the library?** *** This is essentially an optimization library. It should be quite useful, but the performances results don't probe it.
Could you clarify what you mean by "the performances results don't probe it"? Boost.Compute provides a number of benchmark programs which can be used to measure and evaluate the performance of the library on a particular system. Results from this are also published on the "Performance" page [1] in the documentation.
*5. Did you try to use the library? With what compiler? Did you have any problems?** * No.* *** *6. How much effort did you put into your evaluation? A glance? A quick reading? In-depth study?** *** A glance.
*7. Are you knowledgeable about the problem domain?*
Yes and not. I don't know about OpenCL, but I'm aware of the current parallel proposals.
*8. Do you think the library should be accepted as a Boost library? ** * As I have not taken too much time I'm unable to say that this library should not be included in Boost. So I vote to include it conditionally, subject to: * I would need some performances that probe the library is better than using standard STL containers in most of the cases.
Boost.Compute will never be universally better than the STL, and it was never intended to be. CPUs and GPUs make fundamentally different trade-offs which lead to very different performance characteristics. GPUs support performing computation on large amounts of data with very high throughput (in some cases 10x-100x faster than a CPU). The down-side to this is a larger overhead to get the GPU running as well as the cost of transferring data to device memory. Boost.Compute and the STL are different tools which each suit their own different use-cases. I don't think Boost.Compute should be required to supplant the STL in order to be an acceptable library.
* The documentation should be completed with more tutorials and a better reference documentation.
Fully agree, I'll definitely continue to work on improving the documentation. Thanks for the review. Let me know if I can explain anything more clearly. -kyle [1] http://kylelutz.github.io/compute/boost_compute/performance.html