Den 26-09-2017 kl. 23:43 skrev Joaquin M López Muñoz via Boost: Thanks for the thorough review.
3. The question arises of whether segment access can gain us some speed. I've written a small test to measure the performance of a plain std::for_each loop over a batch_deque vs. an equivalent sequence of segment-level loops (attached, batch_deque_for_each.cpp), and this is what I got for Visual C++ 2015 32-bit (x86) release mode in a Windows 7 64-bit box with an Intel Core i5-2520M @2.5GHz:
[](int x){return x;} segment size: 32 n plain segmented 10E3 25.5472 23.6305 10E4 24.5778 23.6907 10E5 24.5821 22.8076 10E6 25.5007 23.1037 10E7 27.1452 24.0339 segment size: 512 n plain segmented 10E3 23.8384 23.6638 10E4 23.0284 23.8705 10E5 22.8449 22.8187 10E6 23.8485 23.7454 10E7 24.1711 23.5404
[](int x){return x%4?x:-x;} segment size: 32 n plain segmented 10E3 33.9795 23.6662 10E4 32.4817 24.023 10E5 32.8731 23.3803 10E6 33.5396 22.9298 10E7 33.1034 23.0206 segment size: 512 n plain segmented 10E3 25.0623 23.3205 10E4 25.1048 23.5812 10E5 25.3343 21.7686 10E6 25.6961 22.4639 10E7 25.8664 22.9964
For 32-bit release mode on windows 7 64 bit with intel i7-2700K:
[](int x){return x;}
segment size: 32
n plain segmented
10E3 21.4589 21.8351
10E4 19.9545 20.5133
10E5 19.4889 20.6197
10E6 19.2552 19.6976
10E7 19.2919 19.5425
segment size: 512
n plain segmented
10E3 20.2503 20.6372
10E4 19.0234 19.3367
10E5 18.5394 18.6171
10E6 18.555 18.5816
10E7 19.0918 19.1833
[](int x){return x%4?x:-x;}
segment size: 32
n plain segmented
10E3 28.743 19.7501
10E4 26.8371 19.0719
10E5 27.0304 18.7624
10E6 26.9561 18.2357
10E7 27.2985 18.6425
segment size: 512
n plain segmented
10E3 22.1073 20.0347
10E4 20.7825 19.5639
10E5 20.6122 18.0773
10E6 20.6039 18.4895
10E7 21.7964 19.1822
So basically the same as your results. The case for segment size 32 and
a non-trivial lambda does show some speedup, doesn't it?
For 64-bit release mode on windows 7 64 bit with intel i7-2700K:
[](int x){return x;}
segment size: 32
n plain segmented
10E3 34.748 21.1357
10E4 32.8879 19.8592
10E5 32.6779 18.955
10E6 32.6255 19.3307
10E7 33.2282 19.3158
segment size: 512
n plain segmented
10E3 28.442 20.0265
10E4 26.5783 18.5851
10E5 26.4857 18.6023
10E6 26.4884 18.6571
10E7 27.0076 19.1338
[](int x){return x%4?x:-x;}
segment size: 32
n plain segmented
10E3 43.0149 18.8431
10E4 42.2736 18.5071
10E5 42.4035 18.7087
10E6 42.1964 18.3355
10E7 42.8113 18.7723
segment size: 512
n plain segmented
10E3 40.3695 19.0028
10E4 38.5371 18.2029
10E5 38.2163 17.85
10E6 38.2952 17.9199
10E7 38.7489 18.6342
I don't know why a 64-bit program would be slower, but there seems to be
a larger difference here.
I'm wondering how the results would be on 32/64 bit ARM.
Also, I do expect a benchmark of serialization to be much better. I
don't think one do that optimally without access to the segments.
Benedek, could you please make a test of the performance of
serialization for both devector/batch_deque vs
boost::vector/boost::deque (release mode, full speed optimization),
perhaps using the same measuring technique as employed by Joaquin. And
then post results and code so people can run it on their favorite
system. You should use types char, int, and something bigger, e.g.
string or array