On 21/12/2014 23:24, Asbjørn wrote:
On 21.12.2014 20:39, Kyle Lutz wrote:
Strongly disagree, the floating-point operations on the device are well defined and their output should be identical to the host results (barring optimizations like "-cl-fast-relaxed-math").
While I agree, I've found Intel's OpenCL CPU device to return results which make me think it uses some relaxed math regardless. With NVIDIA and AMD I can get (essentially) the same results as reference CPU calculations, but with Intel I sometimes get quite large discrepancies. Of course, it's possible I'm just doing it wrong...
Intel's OpenCL CPU implementation (it was SDK 2013) is exactly the one from which I am used to that results normally deviate, quite considerably indeed. I don't have it installed on this machine, but Kyle could you run some kernel code on doubles yourself? My code used basic arithmetic operands, summed values up (up to tens-of-thousands) and there were several exp / log / sqrt along the way. I hadn't set any special compiler flag, surely not -cl-fast-relaxed-math. My suspicion was that they use their own math library which provides highly optimized calculation variants. FWIW I am not even sure if a plain C++ program compiled with the Intel C++ compiler and linking in their math library will produce the same results as e.g. MSVC.