On Tue, Dec 30, 2014 at 6:54 PM, Gruenke,Matt
I assumed there was some behind-the-scenes mechanism to maintain order of higher level operations, but I hadn't dug around to find them. I thought maybe there was an event object embedded in the device memory containers, used to track any pending writes to them, that could be added to the wait list of subsequent operations. But I don't see anything like this.
So, will enabling out-of-order execution will break the higher level operations? Or am I missing something?
Unfortunately I have done very little testing with out-of-order command queues (none of the GPU devices I own support them). My best guess is that any of the high-level algorithms which enqueue multiple sub-operations will run into issues. This should be fixable though by storing events for each sub-operation and passing them along to the next enqueue_* method to maintain the proper dependency ordering. Alternatively, and more simply, we could just block off each sub-operation with a call to enqueue_barrier() to force serialization. This is something I'd like to support cleanly, however it just hasn't been a very high priority. If you're interested in writing up some test cases for out-of-order command queues and fixing some of the algorithms which break that would be very helpful. -kyle