[compute] CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE

Gruenke,Matt

31 Dec 2014 31 Dec '14

2:54 a.m.

I assumed there was some behind-the-scenes mechanism to maintain order of higher level operations, but I hadn't dug around to find them. I thought maybe there was an event object embedded in the device memory containers, used to track any pending writes to them, that could be added to the wait list of subsequent operations. But I don't see anything like this. So, will enabling out-of-order execution will break the higher level operations? Or am I missing something? Matt ________________________________ This e-mail contains privileged and confidential information intended for the use of the addressees named above. If you are not the intended recipient of this e-mail, you are hereby notified that you must not disseminate, copy or take any action in respect of any information contained in it. If you have received this e-mail in error, please notify the sender immediately by e-mail and immediately destroy this e-mail and its attachments.

Show replies by date

Kyle Lutz

31 Dec 31 Dec

4:27 a.m.

On Tue, Dec 30, 2014 at 6:54 PM, Gruenke,Matt <mgruenke@tycoint.com> wrote:

...

I assumed there was some behind-the-scenes mechanism to maintain order of higher level operations, but I hadn't dug around to find them. I thought maybe there was an event object embedded in the device memory containers, used to track any pending writes to them, that could be added to the wait list of subsequent operations. But I don't see anything like this.

So, will enabling out-of-order execution will break the higher level operations? Or am I missing something?

Unfortunately I have done very little testing with out-of-order command queues (none of the GPU devices I own support them). My best guess is that any of the high-level algorithms which enqueue multiple sub-operations will run into issues. This should be fixable though by storing events for each sub-operation and passing them along to the next enqueue_* method to maintain the proper dependency ordering. Alternatively, and more simply, we could just block off each sub-operation with a call to enqueue_barrier() to force serialization. This is something I'd like to support cleanly, however it just hasn't been a very high priority. If you're interested in writing up some test cases for out-of-order command queues and fixing some of the algorithms which break that would be very helpful. -kyle

Gruenke,Matt

5:29 a.m.

-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Kyle Lutz Sent: Tuesday, December 30, 2014 23:27 To: boost@lists.boost.org List Subject: Re: [boost] [compute] CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE

...

On Tue, Dec 30, 2014 at 6:54 PM, Gruenke,Matt wrote:

...

...
So, will enabling out-of-order execution will break the higher level operations? Or am I missing something?

...

My best guess is that any of the high-level algorithms which enqueue multiple sub-operations will run into issues. This should be fixable though by storing events for each sub-operation and passing them along to the next enqueue_* method to maintain the proper dependency ordering. Alternatively, and more simply, we could just block off each sub-operation with a call to enqueue_barrier() to force serialization.

I agree that maintaining order of sub-operations should be easily fixable, in the current design. My primary concern is maintaining order *between* higher level operations, since that might require design changes that could break API compatibility.

...

This is something I'd like to support cleanly, however it just hasn't been a very high priority. If you're interested in writing up some test cases for out-of-order command queues and fixing some of the algorithms which break that would be very helpful.

Having test cases seems like the least of the problems, given no confidence that they'll pass and no ready way to test them. If we can't find a backend which implements out-of-order, perhaps we could write an alternate command_queue that intentionally shuffles command order (within the specified constraints), to use in the tests. Matt ________________________________ This e-mail contains privileged and confidential information intended for the use of the addressees named above. If you are not the intended recipient of this e-mail, you are hereby notified that you must not disseminate, copy or take any action in respect of any information contained in it. If you have received this e-mail in error, please notify the sender immediately by e-mail and immediately destroy this e-mail and its attachments.

Kyle Lutz

6:05 a.m.

On Tue, Dec 30, 2014 at 9:29 PM, Gruenke,Matt <mgruenke@tycoint.com> wrote:

...

-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Kyle Lutz Sent: Tuesday, December 30, 2014 23:27 To: boost@lists.boost.org List Subject: Re: [boost] [compute] CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE

...
On Tue, Dec 30, 2014 at 6:54 PM, Gruenke,Matt wrote:

...
...
So, will enabling out-of-order execution will break the higher level operations? Or am I missing something?

...
My best guess is that any of the high-level algorithms which enqueue multiple sub-operations will run into issues. This should be fixable though by storing events for each sub-operation and passing them along to the next enqueue_* method to maintain the proper dependency ordering. Alternatively, and more simply, we could just block off each sub-operation with a call to enqueue_barrier() to force serialization.

I agree that maintaining order of sub-operations should be easily fixable, in the current design. My primary concern is maintaining order *between* higher level operations, since that might require design changes that could break API compatibility.

True, with the current algorithms API this would have to be managed explicitly with barriers. But looking forward more, I see out-of-order command queues being more useful with a high-level task-graph/pipelines type API which would allow the user to define a chain of high-level operations and dependencies between them and then let Boost.Compute figure out how best to split up the work and submit it for execution on a command queue (out-of-order if available otherwise an in-order queue).

...

...
This is something I'd like to support cleanly, however it just hasn't been a very high priority. If you're interested in writing up some test cases for out-of-order command queues and fixing some of the algorithms which break that would be very helpful.

Having test cases seems like the least of the problems, given no confidence that they'll pass and no ready way to test them.

If we can't find a backend which implements out-of-order, perhaps we could write an alternate command_queue that intentionally shuffles command order (within the specified constraints), to use in the tests.

If I recall correctly, POCL [1] supports out-of-order command queues and could be used for testing this. -kyle [1] http://portablecl.org

Thomas M

7:32 a.m.

On 31/12/2014 07:05, Kyle Lutz wrote:

...

...
Having test cases seems like the least of the problems, given no confidence that they'll pass and no ready way to test them.

If we can't find a backend which implements out-of-order, perhaps we could write an alternate command_queue that intentionally shuffles command order (within the specified constraints), to use in the tests.

If I recall correctly, POCL [1] supports out-of-order command queues and could be used for testing this.

IIRC Intel's OpenCL CPU implementation supports out-of-order queues; I don't know about their iGPUs. Intel surely knows (they have an OpenCL forum). Take care when testing: AFAIK there is no guarantee that the execution order (including parallel execution of commands) is identical across runs -> if a test "passes" this may mean extremely little in general. Thomas

Gruenke,Matt

8:19 a.m.

-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Thomas M Sent: Wednesday, December 31, 2014 2:33 To: boost@lists.boost.org Subject: Re: [boost] [compute] CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE

...

On 31/12/2014 07:05, Kyle Lutz wrote:

...

...
...
If we can't find a backend which implements out-of-order, perhaps we could write an alternate command_queue that intentionally shuffles command order (within the specified constraints), to use in the tests.

If I recall correctly, POCL [1] supports out-of-order command queues and could be used for testing this.

...

IIRC Intel's OpenCL CPU implementation supports out-of-order queues; I don't know about their iGPUs. Intel surely knows (they have an OpenCL forum).

Indeed, this forum post strongly suggests their CPU backend does. I assume it was the CPU backend, since the poster is using a Sandbridge, which I think lacked OpenCL support on their GPUs: https://software.intel.com/en-us/forums/topic/279352 Matt ________________________________ This e-mail contains privileged and confidential information intended for the use of the addressees named above. If you are not the intended recipient of this e-mail, you are hereby notified that you must not disseminate, copy or take any action in respect of any information contained in it. If you have received this e-mail in error, please notify the sender immediately by e-mail and immediately destroy this e-mail and its attachments.

Thomas M

8:33 a.m.

On 31/12/2014 09:19, Gruenke,Matt wrote:

...

Indeed, this forum post strongly suggests their CPU backend does. I assume it was the CPU backend, since the poster is using a Sandbridge, which I think lacked OpenCL support on their GPUs:

https://software.intel.com/en-us/forums/topic/279352

That's correct, Intel introduced [IIRC - quite sure though] OpenCL support for its iGPUs with IvyBridge; SandyBridge surely doesn't. I know virtually nothing about their current OpenCL iGPU implementations. Thomas

3836

Age (days ago)

3836

Last active (days ago)

List overview

Download

6 comments

3 participants

participants (3)

Gruenke,Matt
Kyle Lutz
Thomas M