In my "just for fun" Delphi.Compute library (written in Delphi, inspired by Boost.Compute) I made Copy() and Transform() return futures of the output buffers, as well as accept futures as parameters. Note, Delphi doesn't have iterators like C++ so my routines operate directly on buffers.
So when Transform() say got a Future<Buffer> instead of a Buffer for a parameter, it would add the future's associated event to the wait list
clEnqueueNDRangeKernel (technically the Buffer type has an implicit conversion operator to an "immediate" Future<Buffer>).
This made it pretty seamless to queue up everything and then just wait for
Asbjørn
final read (the default "copy device buffer to host array and return it" call is blocking). The code looks sequential but would only block on that last read.
This idea is pretty nifty and I've been pondering this exact way of implementing asynchrony as well in my library. Another method that I also like would be to just let the buffer store the futures. This is how Joel's/Numscale's NT2 works [0], he gave an awesome talk about it at this years Meeting C++ [1]. But this might just be a higher-level interface more applicable to expression trees as opposed to STL like functions. A third way would be to return a future<void> from all meta-functions and allow meta-functions to take a future<void> - this would directly map to the stream/command_queue but the interface is maybe not that meaningful any more. Does your implementation implicitly synchronize two command_queues if you issue a binary function with two different futures? Sebastian [0] https://github.com/MetaScale/nt2 [1] http://www.slideshare.net/joelfalcou/automatic-taskbased-code-generation-for...