A GPU is an accelerator for large regular computations, and requiring sending memory and receiving it back. It's also programmed with a very constrained programming model that cannot express efficiently all kinds of operations.
A CPU, on the other hand, is a very flexible processor and all memory is already there. You can make it do a lot of complex computations, irregular, sparse or iterative, can do dynamic scheduling and work stealing, and have fine-grained control on all components and how they work together.
However, SIMD has been here for 25 years and is still in the roadmap of future processors. Across all this time it has mostly stayed the same.
On the other hand GPU computing is relatively new and is evolving a lot. It's also quite trendy and buzzword-y, and is in reality not as fast and versatile as marketing makes it out to be. A lot of people seem to be intent on standardizing GPU technology rather
:) You're actually wrong on that, and it's one of the first big surprises anyone who sits on ISO committees experiences: the change in scope of definitions. When you're coming at things from the level of international engineering standards, a computer's CPU is not defined as anything approximating what any of us use on a regular basis. It includes large NUMA clusters, it includes Cray supercomputers all of which don't do SIMD anything like how a PC does. It *also* includes tiny embedded 8-bit CPUs, the kind you find in watches, inlined in wiring, that sort of thing. Some of those tiny CPUs, believe it or not, do SIMD and have done SIMD for donkey's years, but it's in a very primitive way. Some of those CPUs, for example, work in SIMD 3 x 8 bit = 24-bit or even 3 x 9 bit = 27-bit not 32-bit integers, that sort of thing. Yet international engineering standards must *always* target the conservative majority, and PCs or even CPUs designed more recently than the 1990s are always in a minority in that frame of reference. Don't get me wrong: you could standardize desktop class SIMD on its own. But generally you need to hear noise complaining about the costs of lack of standardization, and I'm not aware of much regarding SIMD on CPUs (it's different on GPUs where hedge funds and oil/gas frackers regularly battle lack of interop). than
SIMD technology; that's quite a shame.
Thing is, had Intel decided Larrabee was worth pushing to the mass market - and it was a close thing - PC based SIMD would look completely different now and we wouldn't be using SSE/NEON/AVX which is really an explicit prefetch opcode set for directly programming the ALUs and bypassing the out of order logic, not true SIMD (these are Intel's own words, not mine). As it is, convergence will simply take longer. Some of those new ARM dense cluster servers look awfully like Larrabee actually, 1000+ NUMA ARM Cortex A9's in a single rack, and their density appears to be growing exponentially for now. Given all this change going on, I'd still wait and see. Niall