On 2016-10-16 08:59, degski wrote:
On 16 October 2016 at 08:36, Michael Marcin
wrote: You state that the example is a toy example. To me the example shows that iterating over a vector of smaller objects (pods) is faster than iterating over a vector of larger objects, duh. The real use case might be more interesting, maybe you can describe it.
Hi, I gave a real use-case in a slightly more complicated setting. We had to face the problem of "how can we align data in a way that we can unify a set of slow matrix-vector multiplications into one fast matrix-matrix multiplication". We usually have to traverse the same dataset several 100-1000 times in order to do our computations, so some overhead in the data setup phase (e.g. cost of insertion) is okay and we would sacrifice even more performance in that phase if we got in return more performance in the following three days of computations. Having said that, I agree that just reordering does not give much in many cases. But the real gain is given when data can be aligned better or used more independently. e.g. if the struct contains a std::vector or std::string we can gain a lot by using a data structure which stores the contents of the vectors consecutively, thus removing one level of indirection. Note that this pattern is also common in game industry, where objects are made up of data structures which are used in (nearly) independent parts of the game and where data has to be copied to and from the gpu. Best, Oswin