On 2014-12-29 02:01, Kyle Lutz wrote:
On Sun, Dec 28, 2014 at 7:26 PM, John Bytheway
wrote: It would be very nice to have range versions of all the algorithms. I find myself using the boost::range algorithms much more than the std:: ones these days.
I fully agree and also I'm very much looking forward to having range algorithms standardized in C++17.
Do you mean that you are planning to add them to Boost.Compute?
3. What is your evaluation of the documentation?
Overall, not too bad.
More tutorial would be good. In particular, something using some of the standard algorithms.
I'll definitely work on this more. Any ideas of specific tutorials/examples you'd like to see would be very helpful.
Something using an algorithm which moves elements around might be nice. Maybe sort or partition...
Some concerns:
- All the header links off http://kylelutz.github.io/compute/boost_compute/reference.html are broken.
- All or most of the things in "See also" sections look like they ought to be links but aren't.
- For all your function parameter documentation, the list of parameters seems to be sorted alphabetically by parameter name rather than in the order they are passed to the function. This is really weird...
These are all, to the best of my knowledge, bugs in the Boost documentation toolchain. I reported these issues a while back on the boost-doc mailing list here [1] but never found a resolution. I would really like to get these to work, but am not knowledgable enough about the Boost documentation pipeline (specifically the doxygen->quickbook infrastructure) to fix it. If anyone has any ideas or could help with this I would be very appreciative.
Shame :(. Hope you get some responses here.
- http://kylelutz.github.io/compute/boost/compute/copy.html mentions copying from e.g. a std::list to a compute::vector. It would be nice if it gave some kind of indication as to how efficient we can expect such an operation to be. In particular, I guess it ought to do some kind of buffering on the host?
Currently it will just buffer the entire range into a std::vector<T> and then copy that to the device. If this is a bottleneck it could also be improved to copy the data in smaller batches. Let me know if you encounter issues with this.
Does this also work with copy_async? I presume not, because there would be troubles with the lifetime of the temporary vector...
- The way I normally copy into vectors is call v.reserve(...) and then copy to std::back_inserter(v). I tried that with a compute::vector and unsurprisingly it was very inefficient. I see you already warned about this in the push_back docs, but perhaps it should also be highlighted in the documentation of compute::copy.
Yeah, this idiom is not well suited to copying into device memory. You should prefer to do "bulky" operations using a single call to boost::compute::copy() (which internally will call the OpenCL function for copying large chunks of memory).
- http://kylelutz.github.io/compute/boost/compute/includes.html it's unclear whether the subrange has to be contiguous.
It should have the same semantics of std::includes, which IIRC, does not require the subrange to be contiguous.
Then perhaps rephrasing would make it clearer. The wording on http://en.cppreference.com/w/cpp/algorithm/includes seems clear.
- What source of randomness does random_shuffle use?
Currently it simply calls std::random_shuffle to generate the shuffled index sequence then uses boost::compute::scatter() to actually re-arrange the values on the device. However I am planning on deprecating it and providing an implementation of the C++11 shuffle() algorithm instead which will use the random number generator provided by the user.
Sounds good.
- Why does reduce return its result through an OutputIterator?
In order to allow the algorithm to store the result in device memory and avoid a host-device synchronization point (which would be necessary if it simply returned the reduced result).
That makes sense, but seems inconsistent with accumulate, and to a lesser extent the algorithms that return bool (binary_search, equal, etc.) Are they all introducing a synchronization point? What about the ones returning iterators?
- The boost/compute/types headers docs seem to be broken or missing. In particular I couldn't find reference documentation for float4_ et al.
Yeah, I wasn't sure the best way to document all of these (IIRC, 10 different built in scalar types times four different vector version of each). I'll add some information with an overview of the provided fundamental types.
At least something documenting the supported means of initialization and element access would be good :).
- http://kylelutz.github.io/compute/boost/compute/function.html I'm puzzled by this interface. What's the name for? Would one ever use this constructor rather than e.g. make_function_from_source?
There are a few legitimate use-cases. One is for specifying the names of pre-existing or built-in functions (e.g. function
("sqrt") would call the built-in sqrt() function).
So constructing a function by name points it at some existing named function. That makes sense, but might be unexpected, so should probably be mentioned in the constructor documentation.
- http://kylelutz.github.io/compute/boost/compute/field.html My reading of this is that when you get the type wrong you are guaranteed some kind of meaningful error report, and not just undefined behaviour; is that right? Perhaps you should clarify.
Any comment here?
- The iterators should state what iterator category they have.
Currently Boost.Compute doesn't really have different iterator categories, all iterators are random-access. I'll add this information to the documentation.
Now that you mention this I realize I was confused by what function_input_iterator does. I had assumed it was an InputIterator which supported stateful functions, like http://www.boost.org/doc/libs/1_57_0/libs/iterator/doc/function_input_iterat.... But I guess in fact it has no guarantees about order of evaluation. You might want to clarify this for folks accustomed to Boost.Iterator.
- What does default_device do if there are no devices? Do the various environment variables mentioned have any kind of precedence? (Probably linking to the corresponding OpenCL docs is sufficient for such questions). Do they have to be exact matches? Are they case-sensitive?
There is a precedence but it currently is not explained in the documentation. For most cases only one of the default device variables is meant to be used. They are currently will match substrings and are case-sensitive. I'll work on improving the documentation for these and give some example usages.
- What does find_device do if the requested device does not exist?
Currently it will return a null (default-constructed) device object. I have considered updating this (and the default_device() function) to throw an exception when no valid device is found so that returned device objects are always valid.
I think that would be better. In particular, I experienced what happens if the default device does not initialize properly (in my case because of a missing kernel module) and that was an exception. Missing device should probably be handled similarly.
- Congratulations for getting compile-time detection of internal padding into BOOST_COMPUTE_ADAPT_STRUCT :). You might want to mention in the docs that this will be detected, so people don't worry too much.
Will do. I also hope in the future to transparently support padded structs as well and to be able to correctly copy them to/from the device without issue.
That would be nice.
- It looks like the performance tests don't include the time for copying data to (and, where applicable, from) the device. You should mention this on the Performance page.
I'll add this, though it is very device/system specific. Also, there is a performance benchmark available in Boost.Compute ("perf_copy_to_device") which could be used to measure this on your own system.
I think I was not clear. I mean that, for example, the sort benchmark doesn't include the time taken to copy data before and after the sort. The text on the performance page should explain this, so people can interpret the times properly. John