On 10/18/2016 10:55 AM, Andreas Schäfer wrote:
On 10:29 Tue 18 Oct , Larry Evans wrote:
The purpose of item:
* sizeof...(Ts) allocations could be a single large block
is to just require 1 heap allocation instead of N, where N is the number of vectors in soa
? One benefit of this would be that transferring such a container to another address space (think MPI or CUDA) would become much simple.
It also reduces the size of your handle structure (the structure the holds the pointer to the data). Otherwise every additionally member adds ~24 bytes for a simple tuple< vector<Ts>... > or sizeof(T*) bytes at a minimum for separately allocated blocks. It also can reduce the size of an iterator to a view of the data or remove an indirection from it depending on implementation. It has a nice benefit that the size of the handle + body (alloc'd data block) for a soa_vector would be identical to that of a normal AoS vector containing the same data. If a solution ticked all the other boxes and dropped this one, I'd be fine with that. There's quite a bit of complexity involved with calculating the offsets with a dynamic capacity and potentially arbitrary alignment requirements on internal subarrays. I'm also not sure how to reconcile it with another frequent SoA optimization. Replace bools or small enums/ints with an array of bit packed data. For example the alive bool in the particle_t elsewhere in this thread could be stored as a bit_vector which can be a size and speed win. As seen in the soa_emitter_sse_opt_t addition here: http://codepad.org/eol6auRN AoS in 5.8485 seconds SoA in 4.06838 seconds SoA flat in 3.99157 seconds SoA Static in 5.26953 seconds SoA SSE in 3.53028 seconds SoA SSE opt in 2.98845 seconds P.S. this also shows an improved soa_emitter_t which generates much better code (vs2015) when vector.data() is cached for each member before the loop. Improves SoA update from 191 instructions to 52 instructions and roughly 2x speedup for small (~25k) datasets, but is still more than the 36 instructions required for AoS update.