data:image/s3,"s3://crabby-images/8f1d2/8f1d24864f3c677fd86ea6b0a306e0c58fc00114" alt=""
That does help, and works like a charm. Seems like the C-lowering of the device code is a nasty business, and it's hard to pin point what is busted. But this workaround should be sufficient for many use cases, I think.
I ran into one more problem with passing proto expression to code running on the GPU. Once I make a proto::deep_copy(), the expression gets copied faithfully to the device. But the problem is in evaluation of an expression on the GPU. The "canonical" way for evaluating a lambda like language in proto is to stash the parameters of the operator() function in a fusion::vector and pass it down as state to the evaluator. But the at_c functions of fusion vector don't have the __device__ attributes. Ok, so this needs some background on CUDA : In CUDA, all the functions that you intend to execute on the GPU device have to be "decorated" with the __device__ attribute, like so: __device__ void foo(int a) { ... } This is so that the CUDA compiler can rip them out into a separate file and compile them for the device. Obviously, a __device__ function can only call other __device__ functions. Now, coming back to evaluating a proto expression, I can easily make the operator() function of the expression as __device__. But if I use fusion vectors for passing around parameters, then I can't call the at_c functions for accessing the parameters, because they don't have the __device__attribute. Well, technically, I can modify the Boost headers and add the __device__ attributes everywhere, but that is intrusive and not desirable. So, is there a way to evaluate a lambda like language in proto some other way? Also, one more problem I can see in proto itself is access to the values stored in terminals. The proto::left() or proto::child_c<0>() functions also don't have the __device__ attributes, so I think I have to resort to direct access of the fields of a proto expression. So, the basic question is, how I can write an evaluator in such a way that I don't call internal proto or other boost functions during any of the intermediate steps? Manjunath