Niall Douglas
You're actually wrong on that, and it's one of the first big surprises anyone who sits on ISO committees experiences: the change in scope of definitions. When you're coming at things from the level of international engineering standards, a computer's CPU is not defined as anything approximating what any of us use on a regular basis. It includes large NUMA clusters, it includes Cray supercomputers all of which don't do SIMD anything like how a PC does. It *also* includes tiny embedded 8-bit CPUs, the kind you find in watches, inlined in wiring, that sort of thing. Some of those tiny CPUs, believe it or not, do SIMD and have done SIMD for donkey's years, but it's in a very primitive way. Some of those CPUs, for example, work in SIMD 3 x 8 bit = 24-bit or even 3 x 9 bit = 27-bit not 32-bit integers, that sort of thing. Yet international engineering standards must *always* target the conservative majority, and PCs or even CPUs designed more recently than the 1990s are always in a minority in that frame of reference.
Exactly. I urge anyone working on parallelism-related stuff to investigate the many vector and parallel architectures that have been developed over the decades. The proposed SIMD library is a *very* small slice of what's been done and it is a relatively inefficient model at that. It was developed in the 1990's when we had much less die area and couldn't afford to do "real" vector ISAs in microprocessors. The world has changed since then.
Thing is, had Intel decided Larrabee was worth pushing to the mass market - and it was a close thing - PC based SIMD would look completely different now and we wouldn't be using SSE/NEON/AVX
As it is, convergence will simply take longer.
See Intel MIC. This stuff is coming much faster than most people realize. From where I sit (developing compilers professionally for vector architectures), the path is clear and it is not the current SSE/AVX model. -David