Re: [boost] [compute] Some questions

23 Dec 2014

      On Tue, Dec 23, 2014 at 12:55 PM, Andrey Semashev
 wrote:
...
On Tue, Dec 23, 2014 at 7:29 PM, Kyle Lutz  wrote:
...
On Tue, Dec 23, 2014 at 1:20 AM, Andrey Semashev
 wrote:
...
1. When you define a kernel (e.g. with the BOOST_COMPUTE_FUNCTION
macro), is this kernel supposed to be in C? Can it reference global
(namespace scope) objects and other functions? Other kernels?
Yes, the source code for OpenCL kernels and functions is specified in
OpenCL C which is a dialect of C99 with extensions for vectorized
operations.
Does this mean that the compiler has to support OpenCL in order to be
able to use Boost.Compute? Or its specific features? If yes, can this
be mentioned in the docs (with the list of the affected features, if
possible)?
No, Boost.Compute does not require any special compiler or compiler
extensions. It will work with all standards-conforming C++03 and later
compilers.
...
Also, I don't quite understand, how the kernel source code which I
supply to BOOST_COMPUTE_FUNCTION is then compiled into kernel. Is this
source code just stringized and not actually compiled when the
application is built?
Yes, the source argument for BOOST_COMPUTE_FUNCTION() is stringized
and then inserted into a OpenCL program when "invoked" by an
algorithm. And you're right, the function source is not compiled by
the host-compiler, though the function signature itself is which gives
us some degree of type-safety.
...
...
There are a few ways to specific kernel functions which reference
global C++ values. One is the BOOST_COMPUTE_CLOSURE() macro [1] which
works similarly to BOOST_COMPUTE_FUNCTION(), but also allows a
lambda-like capture list of C++ values.
...
2. When is the kernel compiled and uploaded to the device? Is it
possible to cache and reuse the compiled kernel?
If writing a custom kernel, the kernel is built when the
"program::build()" method is called. Internally, the higher-level
algorithms compile programs when they're needed and store them in a
global program cache.
And yes, compiled program and kernel objects can be stored and re-used
(this is strongly recommended). Boost.Compute provides the
program_cache class [3] which is used stores frequently used programs
as compiled objects.
So, e.g. a kernel defined with BOOST_COMPUTE_FUNCTION will be compiled
when first used, and then saved in some global program_cache, is that
correct? Also, captured arguments of BOOST_COMPUTE_CLOSURE will be
evaluated only once, when the kernel is built?
Yeah, the algorithms in Boost.Compute will create a program with the
function's source and then store it in the global program cache for
later use.

And captured values with BOOST_COMPUTE_CLOSURE() are stored by
reference and are updated if the corresponding C++ values change.
Currently changing captured values will cause a kernel re-compilation.
I'm working on improving this to avoid the re-compilation and simply
pass the new values to the kernel.
...
...
...
3. Why is the library not thread-safe by default? I'd say, we're long
past single-threaded systems now, and having to always define the
config macro is a nuisance.
I would very much like to have it thread-safe by default. This is a
problem however with keeping the library header-only and useable with
C++03 compilers. The BOOST_COMPUTE_THREAD_SAFE macro basically just
instructs Boost.Compute to use the C++11 "thread_local" specifier for
global objects instead of "static". With C++03 compilers, this will
use boost::thread_specific_ptr<> which then requires users to also
link to Boost.Thread.
That said, I still don't think it's ideal and I am very open to
ideas/patches which improve this.
Personally, I see no big problem with dependency on Boost.Thread in
C++03. However, it is quite possible to use system API to implement
TLS in header-only library.
On POSIX systems it is quite trivial with pthread_once and
pthread_key* API. On Windows you can use Interlocked* functions or
Boost.Atomic to implement something similar to pthread_once and Tls*
functions for the TLS itself. The tricky part is the TLS cleanup,
which can be done with help of the Windows thread pool. You can use
RegisterWaitForSingleObject to schedule a wait operation on the handle
of the thread that sets the thread-local value. When the thread exits,
the pool will invoke the callback you passed to
RegisterWaitForSingleObject, where you can clean the TLS value. The
important difference from thread_local and Boost.Thread is that the
callback is called in a thread different from the one that initialized
the TLS value, but for various cleanup routines this should not
matter.
You can see how it's done in Boost.Sync:
https://github.com/boostorg/sync/blob/develop/include/boost/sync/detail/wait...
I personally don't see an issue with depending on Boost.Thread either
but this does prevent the library from being header-only. I'll take a
look at your example and see if that can be worked into Boost.Compute.
Thanks!

-kyle