On Tue, Dec 30, 2014 at 4:12 AM, Asbjørn
On 29.12.2014 01:42, Kyle Lutz wrote:
On Sun, Dec 28, 2014 at 1:40 PM, Asbjørn
wrote: 2) I did miss async versions of the algorithms, so it's possible to chain together multiple calls. Even though all the data sits on the compute device, the overhead of waiting for each operation to finish before queuing the next can make the compute gains completely irrelevant.
Can you let me know what chain of functions you're calling? Many algorithms should already execute asynchronously and provide the behavior you expect.
Seems I had missed this crucial part. Given that copy() is sync and there's a special async version of it, if you forget the details it's easy to forget that other operations are enqueued non-blocking. For example, on the reference page of transform()[1] there's no mention of it being asynchronous. It makes sense assuming the default in-order command queue execution, but I think it should be more explicit to make it harder to forget late at night :)
I agree, I'll work on documenting this behavior better.
3) I think relevant calls should have a non-throwing form returning an error code, ala Boost.ASIO.
This could be implemented, but would be a large amount of work (essentially doubling the size of the public API). Can you let me know more about your use-case and why the current exception-based API is not suitable?
Fair enough. From experience, the main errors which one can handle sensibly would be insufficient memory for a buffer (one can try using smaller buffers/reduce dataset/alternate algorithm) and kernels failing to compile/run due to lack of registers or similar resources (one can try an alternate algorithm/kernel).
For example, using a single large buffer may be significantly faster, but if the max buffer size is small, one can switch to a ping-pong algorithm.
Yeah, I would be very interested in exploring error handling strategies like these. -kyle