On 10/25/2016 12:41 PM, Larry Evans wrote:
On 10/25/2016 12:22 PM, Larry Evans wrote: [snip]
From the above, the LibFlatArray and SSE methods are the fastest. I'd guess that a new "SoA block SSE" method, which uses the _mm_* methods, would narrow the difference. I'll try to figure out how to do that. I notice:
#include
doesn't produce a compile error; however, that #include doesn't have the _mm_add_ps used here:
https://github.com/cppljevans/soa/blob/master/soa_compare.benchmark.cpp#L621
Do you know of some package I could install on my ubuntu OS that makes those SSE functions, such as _mm_add_ps, available?
[snip] Never mind. Google for:
__mm128
lead to:
http://stackoverflow.com/questions/11679741/vector-of-mm128-wont-push-back
and change of #include to:
#include
which solved problem.
particle_count=1,024 frames=1,000 minimum duration=0.0371714
comparitive performance table:
method rel_duration ________ ______________ SSE_opt 0.330574 SSE 0.440405 Flat 0.904265 SoA 0.911574 Block 0.97398 AoS 1 StdArray 1.15079 LFA undefined
OOPS. Another copy&paste careless error. Output should be: --{--cut here-- particle_count=1,000,000 frames=1,000 minimum duration=3.5909 comparitive performance table: method rel_duration ________ ______________ SSE_opt 1 SSE 1.01568 StdArray 1.44133 Flat 1.44861 Block 1.45053 SoA 1.52935 AoS 2.10294 LFA undefined Compilation finished at Tue Oct 25 16:12:45 --}--cut here-- which clearly shows SSE_opt as fastest. -regards, Larry -regards, Larry