[serialization] Runtime overhead of serialization archives
Hi all,
currently i try to measure the runtime overhead of serialization vs. plain
c-style serialization. The goal is to send it from one process to an other
process on the same machine. As my statemachines are written with
boost::msm, i want to deliver types.
For the IPC i used interprocess::message_queue. I wrote a template
function for the test and added a trait to plug in the test functions.
I tested on the same machine, once with Linux Debian stretch x64 and Win
10 x64 MSVC 2015.
What really astonish me is the fact, that the measured times using
boost::serilization is so high compared to "c-style: id + data" method. In
the c-style method i used a FNV hash of the type name as the ID.
All tests were done on a Intel Core i7 2670QM CPU. All results in sec.
I sent/received 100000 objects over a message_queue.
Boost 1.61.0 Linux x86_64 / gcc 6.1.1 Win 10 x64/ MSVC 2015
Boost XML Send 2.220753 8.255834
Boost XML Receive 3.208353 10.14462
Boost Text Send 2.024946 8.578654
Boost Text Receive 3.207359 10.704126
Boost BinarySend 2.018026 8.363865
Boost Binary Receive 3.17984 11.201501
Cstyle Send 0.13566 0.056814
Cstyle Receive 0.087906 0.058706
Char Send 0.071683 0.013965
Char Receive 0.062119 0.012631
To measure the real overhead of the message passing, i made a test and
just sent 100000 plain chars over the "wire".
There are two strange things:
a) The serilization with boost seems to be about 16 times slower than the
plain c-style method, the receive seems to be about 30 times slower. I
think i do something wrong..... he tests were compiled on release mode.
b) On the same hardware, the Windows implementation is so much slower than
the Linux one... about factor 3. But at the c-style method, it turns
around.... Linux is slower than Windows.
Has anyone an idea whats the issue here?
I added the code at the end of this text
Best regards
Georg
// -----------------------------------------------------
// Code
// -----------------------------------------------------
#define BOOST_TEST_MODULE first_tests
#include
Am 21.09.2016 um 10:14 schrieb georg@schorsch-tech.de:
struct boost_xml_trait { static const char *name() { return "boost_xml_test: ev_test: "; }
typedef boost::archive::xml_oarchive oarchive; typedef boost::archive::xml_iarchive iarchive; };
struct boost_text_trait { static const char *name() { return "boost_text_test: ev_test: "; }
typedef boost::archive::xml_oarchive oarchive; typedef boost::archive::xml_iarchive iarchive; };
struct boost_binary_trait { static const char *name() { return "boost_binary_test: ev_test: "; }
typedef boost::archive::xml_oarchive oarchive; typedef boost::archive::xml_iarchive iarchive; };
I already figured out that accidently had used in all three test xml archives. Now i can see bigger differences, but its still slower much than cstyle. Here on Linux x86_64 gcc 6.1.1. Running 5 test cases... Sending boost_xml_test: ev_test: 100000 1.979675s wall, 1.950000s user + 0.030000s system = 1.980000s CPU (100.0%) Receiving boost_xml_test: ev_test: 100000 3.253286s wall, 3.250000s user + 0.010000s system = 3.260000s CPU (100.2%) RT Counter: 100000 Sending boost_text_test: ev_test: 100000 1.667762s wall, 1.600000s user + 0.060000s system = 1.660000s CPU (99.5%) Receiving boost_text_test: ev_test: 100000 1.477573s wall, 1.480000s user + 0.000000s system = 1.480000s CPU (100.2%) RT Counter: 100000 Sending boost_binary_test: ev_test: 100000 1.303905s wall, 1.240000s user + 0.070000s system = 1.310000s CPU (100.5%) Receiving boost_binary_test: ev_test: 100000 1.132586s wall, 1.130000s user + 0.000000s system = 1.130000s CPU (99.8%) RT Counter: 100000 Sending cstyle_test: 100000 0.119564s wall, 0.070000s user + 0.050000s system = 0.120000s CPU (100.4%) Receiving cstyle_test: 100000 0.081580s wall, 0.080000s user + 0.000000s system = 0.080000s CPU (98.1%) RT Counter: 100000 Sending char_test 'c': 100000 0.125667s wall, 0.090000s user + 0.040000s system = 0.130000s CPU (103.4%) Receiving char_test 'c': 100000 0.086719s wall, 0.080000s user + 0.010000s system = 0.090000s CPU (103.8%) RT Counter: 9900000 *** No errors detected -- pgp key: 0x702C5BFC Fingerprint: 267F DC06 7F96 3375 969A 9EE6 8E37 7CF4 702C 5BFC
hI,
I already figured out that accidently had used in all three test xml archives. Now i can see bigger differences, but its still slower much than cstyle.
Also try to not create a stream and an archive in every iteration, but instead reuse it. Especially the streams can be quite expensive to create.
Am 21.09.2016 um 19:36 schrieb Bjorn Reese:
On 09/21/2016 06:35 PM, Georg Gast wrote:
Now i can see bigger differences, but its still slower much than cstyle.
The Boost archives use iostreams, whereas cstyle uses memcpy.
Yes, thats clear. :)
After watching "CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks,
and CPUs, and Compilers! Oh My!" [1] i used on linux the
google/benchmark [2] library to measure more precise.
[1] https://www.youtube.com/watch?v=nXaxk27zwlk
[2] https://github.com/google/benchmark
This seems to be much better to judge the performance.
I let each test run for at least 10 seconds.
-----------------------------------------------------
Linux x64 gcc 6.1.1
Benchmark Time(ns) CPU(ns) Iterations
-------------------------------------------------
to_wire_xml 16073 16072 872818
to_wire_text 14413 14409 997151
to_wire_binary 10384 10520 1268116
to_wire_cstyle 218 218 63405797
from_wire_xml 32202 32209 434783
from_wire_text 13322 13320 1023392
from_wire_binary 9906 9906 1402806
from_wire_cstyle 210 210 66666667
-----------------------------------------------------
-----------------------------------------------------
Win 10 x64 MSVC 2015
Benchmark Time(ns) CPU(ns) Iterations
-------------------------------------------------
to_wire_xml 84145 84027 173308
to_wire_text 54691 54751 250279
to_wire_binary 44086 44028 315493
to_wire_cstyle 110 110 126197183
from_wire_xml 97023 96801 143820
from_wire_text 51315 51250 273171
from_wire_binary 43359 43408 320000
from_wire_cstyle 103 103 135757576
-----------------------------------------------------
My opinion: gcc is better at optimizing.... This must be the reason why
windows is slower at the archives.
-----------------------------------------------------
The code
-----------------------------------------------------
static void to_wire_xml(benchmark::State& state)
{
while (state.KeepRunning())
{
boost_test
On 09/21/2016 08:35 PM, Georg Gast wrote:
Am 21.09.2016 um 19:36 schrieb Bjorn Reese:
The Boost archives use iostreams, whereas cstyle uses memcpy.
Yes, thats clear. :)
I am not sure how to interpret your response. My statement was not a casual observation about your tests, but the main explanation for the difference in performance. That is one of the reasons why my own archives, unlike the ones that are part of Boost.Serialization, are constructed to serialize directly to/from other container types such as arrays, std::string, and std::vector.
Hello Bjorn,
I just wanted to make clear that I know that this c style thing uses memcpy. I would like to use the boost serialization to have its advantages compared to the c style thing. In fact I set up this test to see what runtime costs are there compared to c memcpy. I use boost serialization a lot, but to now not on a time critical path.
In my source I use for the streams boost iostream array_source/sink to serialization into/from a vector of chars (my packet typedef).
Could you please elaborate what is different in your archive?
Thanks!
Am 21. September 2016 23:31:09 MESZ, schrieb Bjorn Reese
On 09/21/2016 08:35 PM, Georg Gast wrote:
Am 21.09.2016 um 19:36 schrieb Bjorn Reese:
The Boost archives use iostreams, whereas cstyle uses memcpy.
Yes, thats clear. :)
I am not sure how to interpret your response. My statement was not a casual observation about your tests, but the main explanation for the difference in performance.
That is one of the reasons why my own archives, unlike the ones that are part of Boost.Serialization, are constructed to serialize directly to/from other container types such as arrays, std::string, and std::vector.
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On 09/21/2016 08:35 PM, Georg Gast wrote:
Am 21.09.2016 um 19:36 schrieb Bjorn Reese:
The Boost archives use iostreams, whereas cstyle uses memcpy.
Yes, thats clear. :)
I am not sure how to interpret your response. My statement was not a casual observation about your tests, but the main explanation for the difference in performance.
That is one of the reasons why my own archives, unlike the ones that are part of Boost.Serialization, are constructed to serialize directly to/from other container types such as arrays, std::string, and std::vector.
I just found out one issue on Windows:
static void to_wire_xml(benchmark::State& state)
{
//std::locale::global(std::locale("C"));
while (state.KeepRunning())
{
boost_test
Am 22.09.2016 um 08:35 schrieb georg@schorsch-tech.de:
I just found out one issue on Windows:
static void to_wire_xml(benchmark::State& state) { //std::locale::global(std::locale("C"));
while (state.KeepRunning()) { boost_test
::to_wire(ev_test()); } } If i toggle the commented line, the cost goes down to the half. With the Windows profiler i found out, that the construction of the locale takes so much time.
Without global set locale: Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 78066 77177 7479 from_wire_xml 95638 95949 7479
With global set locale: 09/22/16 08:32:49 Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 41399 41302 16619 from_wire_xml 52841 52844 11218
Thats amazing! Thats the Level of the Linux implementation.
One riddle is solved :)
After fiddling around on linux with clang 3.8 and gcc optimizer options i got down to this. With gcc and -O3. Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 11174 11178 381818 to_wire_text 5148 5149 820313 to_wire_binary 3327 3330 1141304 to_wire_cstyle 63 63 65217391 from_wire_xml 27170 27183 155096 from_wire_text 5371 5370 783582 from_wire_binary 3226 3228 1296296 from_wire_cstyle 45 45 93750000 This results look very nice. <6µs for serilize/deserialize a structure to a portable text archive seems very nice :) Now is the difference again pretty big compared to windows ..... -- pgp key: 0x702C5BFC Fingerprint: 267F DC06 7F96 3375 969A 9EE6 8E37 7CF4 702C 5BFC
After fiddling around on linux with clang 3.8 and gcc optimizer options i got down to this. With gcc and -O3.
Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 11174 11178 381818 to_wire_text 5148 5149 820313 to_wire_binary 3327 3330 1141304 to_wire_cstyle 63 63 65217391 from_wire_xml 27170 27183 155096 from_wire_text 5371 5370 783582 from_wire_binary 3226 3228 1296296 from_wire_cstyle 45 45 93750000
This results look very nice. <6µs for serilize/deserialize a structure to a portable text archive seems very nice :)
Now is the difference again pretty big compared to windows .....
For what it's worth, in tests I've done in the past, binary serialization using boost.serialization and other similar systems was not this big of a difference compared to memcpy. I was seeing maybe a 5x to 10x difference compared to memcpy (yours is 50x). Of course, this depends on a lot of factors, like how much data this is because it would determine if you are memory bound or not, but I am wondering if your cstyle tests are actually being completely optimized away. Have you examined the disassembly? If you find the code is being optimized away, Google Benchmark has a handy "benchmark::DoNotOptimize" function to help keep the optimizer from throwing away the side effects of a particular address. -- chris
Am 22.09.2016 um 22:38 schrieb Chris Glover:
For what it's worth, in tests I've done in the past, binary serialization using boost.serialization and other similar systems was not this big of a difference compared to memcpy. I was seeing maybe a 5x to 10x difference compared to memcpy (yours is 50x).
Of course, this depends on a lot of factors, like how much data this is because it would determine if you are memory bound or not, but I am wondering if your cstyle tests are actually being completely optimized away. Have you examined the disassembly? If you find the code is being optimized away, Google Benchmark has a handy "benchmark::DoNotOptimize" function to help keep the optimizer from throwing away the side effects of a particular address.
Thanks for that hint. I changed the test and set up a range for the data
size. As the size exceeded about 2 Mb, the cstyle thing got half as fast
as the boost::binary_archive. That surprised me and i decided to not go
that path (cstyle), because i really like to use the library.
As i setup the ranges for the sizes, i got a lot of results. In fact one
thing is notable, the processed bytes/sec.
As the data size grows higher than 512 bytes, the processing speed
settles at the max rate (for textual archives). It seems each kind of
archive has its own limit. The xml archive, as its the most verbose, has
the least speed. The binary_archive seems to "cheat" on this test... 7
GB/s ... I guess it just lays totally in the cache.
The test were done with gcc 6.1.1/boost 1.61.0 and -O3 optimization.
For documentation issues, i add here my results on linux and the current
code.
Georg
Benchmark Time(ns) CPU(ns) Iterations Bandwidth
-------------------------------------------------------------------
to_wire_xml/8 15797 15840 31818 493.211kB/s
to_wire_xml/64 37456 37385 18617 1.6326MB/s
to_wire_xml/512 211649 211188 3182 2.31207MB/s
to_wire_xml/4k 1639611 1639344 427 2.38281MB/s
to_wire_xml/32k 13641742 13647059 51 2.28987MB/s
to_wire_xml/256k 106978476 107333333 6 2.32919MB/s
to_wire_xml/2M 870869606 872000000 1 2.29358MB/s
to_wire_xml/4M 1819503270 1816000000 1 2.20264MB/s
from_wire_xml/8 31584 31600 22152 247.232kB/s
from_wire_xml/64 56806 56640 12500 1103.46kB/s
from_wire_xml/512 240413 239021 2778 2.04284MB/s
from_wire_xml/4k 1742682 1739558 407 2.24554MB/s
from_wire_xml/32k 14104072 14122449 49 2.21279MB/s
from_wire_xml/256k 113079335 113142857 7 2.2096MB/s
from_wire_xml/2M 846656504 844000000 1 2.36967MB/s
from_wire_xml/8M 3387609285 3388000000 1 2.36128MB/s
to_wire_text/8 6204 6181 109375 1.23442MB/s
to_wire_text/64 9197 9200 76087 6.63426MB/s
to_wire_text/512 31154 31095 23026 15.7027MB/s
to_wire_text/4k 201879 200892 3365 19.4446MB/s
to_wire_text/32k 1624883 1620609 427 19.2829MB/s
to_wire_text/256k 12647559 12654545 55 19.7557MB/s
to_wire_text/2M 100406115 100000000 7 20MB/s
to_wire_text/4M 216889302 216000000 3 18.5185MB/s
from_wire_text/8 6283 6256 102941 1.21953MB/s
from_wire_text/64 9104 9095 76087 6.71096MB/s
from_wire_text/512 33779 33810 20349 14.4419MB/s
from_wire_text/4k 224963 225219 2966 17.3442MB/s
from_wire_text/32k 1759826 1757895 380 17.7769MB/s
from_wire_text/256k 14159723 14122449 49 17.7023MB/s
from_wire_text/2M 112441804 112666667 6 17.7515MB/s
from_wire_text/4M 224818542 225333333 3 17.7515MB/s
to_wire_binary/8 4257 4256 163551 1.79281MB/s
to_wire_binary/64 4405 4394 162037 13.8904MB/s
to_wire_binary/512 4324 4325 159091 112.909MB/s
to_wire_binary/4k 5180 5200 134615 751.2MB/s
to_wire_binary/32k 11714 11657 58333 2.61791GB/s
to_wire_binary/256k 74599 74693 9211 3.26857GB/s
to_wire_binary/2M 1160753 1159520 583 1.68443GB/s
to_wire_binary/4M 2583586 2578755 273 1.51478GB/s
from_wire_binary/8 3509 3500 201149 2.17989MB/s
from_wire_binary/64 3476 3480 201149 17.5388MB/s
from_wire_binary/512 3601 3598 192308 135.694MB/s
from_wire_binary/4k 3833 3840 182292 1017.25MB/s
from_wire_binary/32k 6697 6683 102941 4.56615GB/s
from_wire_binary/256k 33168 33201 21084 7.35352GB/s
from_wire_binary/2M 268648 268842 2574 7.26495GB/s
from_wire_binary/4M 820816 821128 833 4.75717GB/s
<code>
// STL Archive + Stuff
#include
Am 22.09.2016 um 22:38 schrieb Chris Glover:
For what it's worth, in tests I've done in the past, binary serialization using boost.serialization and other similar systems was not this big of a difference compared to memcpy. I was seeing maybe a 5x to 10x difference compared to memcpy (yours is 50x).
Of course, this depends on a lot of factors, like how much data this is because it would determine if you are memory bound or not, but I am wondering if your cstyle tests are actually being completely optimized away. Have you examined the disassembly? If you find the code is being optimized away, Google Benchmark has a handy "benchmark::DoNotOptimize" function to help keep the optimizer from throwing away the side effects of a particular address.
Hi, i have prepared a comparison of the same code between linux and windows (msvc). There is such a huge gap between linux and windows. I am not sure, if i should post the images to this mailing list. AFAIK is it seen as bad behaviour to post non text to this list, so i added the link. https://www.schorsch-tech.de/doku.php?id=c:boost_serializationwindows Has anyone an idea why this is the case? Look at this diagrams, i cant explain whats happening here. its on the same maschine on bare metal (no VM). Georg
I guess you are still comparing release to debug version. I've ran your code and this is what I've got Win7 x64, VS2015 Update3, Release, x64 Run on (8 X 3392 MHz CPU s) 09/25/16 07:36:53 Benchmark Time CPU Iterations ------------------------------------------------------------ to_wire_xml/8 39564 ns 39980 ns 17949 195.409kB/s to_wire_xml/64 98915 ns 98035 ns 7479 637.527kB/s to_wire_xml/512 583376 ns 583961 ns 1122 856.222kB/s to_wire_xml/4k 4494721 ns 4428415 ns 155 903.258kB/s to_wire_xml/32k 35621888 ns 35100225 ns 20 911.675kB/s to_wire_xml/256k 285296564 ns 280801800 ns 2 911.675kB/s to_wire_xml/2M 2294702350 ns 2293214700 ns 1 893.069kB/s to_wire_xml/4M 4596100456 ns 4586429400 ns 1 893.069kB/s from_wire_xml/8 44701 ns 44361 ns 15473 176.11kB/s from_wire_xml/64 97779 ns 98035 ns 7479 637.527kB/s from_wire_xml/512 523956 ns 530403 ns 1000 942.679kB/s from_wire_xml/4k 3919513 ns 3877482 ns 173 1031.6kB/s from_wire_xml/32k 30990532 ns 31200200 ns 22 1025.63kB/s from_wire_xml/256k 248254367 ns 249601600 ns 3 1025.63kB/s from_wire_xml/2M 1990579271 ns 1981212700 ns 1 1033.71kB/s from_wire_xml/8M 7927240207 ns 7924850800 ns 1 1033.71kB/s to_wire_text/8 13381 ns 13142 ns 49857 594.483kB/s to_wire_text/64 31969 ns 31985 ns 22436 1.90827MB/s to_wire_text/512 180335 ns 179751 ns 4079 2.71643MB/s to_wire_text/4k 1363654 ns 1375560 ns 499 2.83975MB/s to_wire_text/32k 10990438 ns 10968820 ns 64 2.84898MB/s to_wire_text/256k 86883137 ns 88400567 ns 9 2.82804MB/s to_wire_text/2M 696132001 ns 686404400 ns 1 2.91373MB/s to_wire_text/4M 1398212634 ns 1388408900 ns 1 2.881MB/s from_wire_text/8 11158 ns 11195 ns 64102 697.873kB/s from_wire_text/64 25274 ns 25588 ns 28045 2.38534MB/s from_wire_text/512 138245 ns 137666 ns 4986 3.54685MB/s from_wire_text/4k 1047166 ns 1046497 ns 641 3.73269MB/s from_wire_text/32k 8304279 ns 8320053 ns 90 3.75599MB/s from_wire_text/256k 66510527 ns 66654973 ns 11 3.75066MB/s from_wire_text/2M 533393808 ns 530403400 ns 1 3.77071MB/s from_wire_text/4M 1055956857 ns 1060806800 ns 1 3.77071MB/s to_wire_binary/8 5444 ns 5460 ns 100000 1.39732MB/s to_wire_binary/64 5411 ns 5424 ns 112179 11.2538MB/s to_wire_binary/512 5523 ns 5563 ns 112179 87.7797MB/s to_wire_binary/4k 5966 ns 5980 ns 112179 653.244MB/s to_wire_binary/32k 28940 ns 29412 ns 24929 1062.5MB/s to_wire_binary/256k 251626 ns 250358 ns 2804 998.569MB/s to_wire_binary/2M 2548630 ns 2540925 ns 264 787.115MB/s to_wire_binary/4M 6361041 ns 6407184 ns 112 624.299MB/s from_wire_binary/8 5363 ns 5284 ns 112179 1.44375MB/s from_wire_binary/64 5371 ns 5460 ns 100000 11.1785MB/s from_wire_binary/512 5386 ns 5460 ns 100000 89.4282MB/s from_wire_binary/4k 5483 ns 5424 ns 112179 720.244MB/s from_wire_binary/32k 7685 ns 7649 ns 89743 3.98998GB/s from_wire_binary/256k 25332 ns 25588 ns 28045 9.54136GB/s from_wire_binary/2M 620654 ns 625672 ns 1122 3.12164GB/s from_wire_binary/4M 1306333 ns 1306960 ns 561 2.98881GB/s -----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Georg Gast Sent: Sunday, September 25, 2016 6:42 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [serialization] Runtime overhead of serialization archives Am 22.09.2016 um 22:38 schrieb Chris Glover:
For what it's worth, in tests I've done in the past, binary serialization using boost.serialization and other similar systems was not this big of a difference compared to memcpy. I was seeing maybe a 5x to 10x difference compared to memcpy (yours is 50x).
Of course, this depends on a lot of factors, like how much data this is because it would determine if you are memory bound or not, but I am wondering if your cstyle tests are actually being completely optimized away. Have you examined the disassembly? If you find the code is being optimized away, Google Benchmark has a handy "benchmark::DoNotOptimize" function to help keep the optimizer from throwing away the side effects of a particular address.
Hi, i have prepared a comparison of the same code between linux and windows (msvc). There is such a huge gap between linux and windows. I am not sure, if i should post the images to this mailing list. AFAIK is it seen as bad behaviour to post non text to this list, so i added the link. https://www.schorsch-tech.de/doku.php?id=c:boost_serializationwindows Has anyone an idea why this is the case? Look at this diagrams, i cant explain whats happening here. its on the same maschine on bare metal (no VM). Georg _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Am 25.09.2016 um 07:05 schrieb Ernest Zaslavsky:
I guess you are still comparing release to debug version. I've ran your code and this is what I've got
Win7 x64, VS2015 Update3, Release, x64
Run on (8 X 3392 MHz CPU s) 09/25/16 07:36:53 Benchmark Time CPU Iterations ------------------------------------------------------------ to_wire_xml/8 39564 ns 39980 ns 17949 195.409kB/s to_wire_xml/64 98915 ns 98035 ns 7479 637.527kB/s to_wire_xml/512 583376 ns 583961 ns 1122 856.222kB/s to_wire_xml/4k 4494721 ns 4428415 ns 155 903.258kB/s to_wire_xml/32k 35621888 ns 35100225 ns 20 911.675kB/s to_wire_xml/256k 285296564 ns 280801800 ns 2 911.675kB/s to_wire_xml/2M 2294702350 ns 2293214700 ns 1 893.069kB/s to_wire_xml/4M 4596100456 ns 4586429400 ns 1 893.069kB/s from_wire_xml/8 44701 ns 44361 ns 15473 176.11kB/s from_wire_xml/64 97779 ns 98035 ns 7479 637.527kB/s from_wire_xml/512 523956 ns 530403 ns 1000 942.679kB/s from_wire_xml/4k 3919513 ns 3877482 ns 173 1031.6kB/s from_wire_xml/32k 30990532 ns 31200200 ns 22 1025.63kB/s from_wire_xml/256k 248254367 ns 249601600 ns 3 1025.63kB/s from_wire_xml/2M 1990579271 ns 1981212700 ns 1 1033.71kB/s from_wire_xml/8M 7927240207 ns 7924850800 ns 1 1033.71kB/s to_wire_text/8 13381 ns 13142 ns 49857 594.483kB/s to_wire_text/64 31969 ns 31985 ns 22436 1.90827MB/s to_wire_text/512 180335 ns 179751 ns 4079 2.71643MB/s to_wire_text/4k 1363654 ns 1375560 ns 499 2.83975MB/s to_wire_text/32k 10990438 ns 10968820 ns 64 2.84898MB/s to_wire_text/256k 86883137 ns 88400567 ns 9 2.82804MB/s to_wire_text/2M 696132001 ns 686404400 ns 1 2.91373MB/s to_wire_text/4M 1398212634 ns 1388408900 ns 1 2.881MB/s from_wire_text/8 11158 ns 11195 ns 64102 697.873kB/s from_wire_text/64 25274 ns 25588 ns 28045 2.38534MB/s from_wire_text/512 138245 ns 137666 ns 4986 3.54685MB/s from_wire_text/4k 1047166 ns 1046497 ns 641 3.73269MB/s from_wire_text/32k 8304279 ns 8320053 ns 90 3.75599MB/s from_wire_text/256k 66510527 ns 66654973 ns 11 3.75066MB/s from_wire_text/2M 533393808 ns 530403400 ns 1 3.77071MB/s from_wire_text/4M 1055956857 ns 1060806800 ns 1 3.77071MB/s to_wire_binary/8 5444 ns 5460 ns 100000 1.39732MB/s to_wire_binary/64 5411 ns 5424 ns 112179 11.2538MB/s to_wire_binary/512 5523 ns 5563 ns 112179 87.7797MB/s to_wire_binary/4k 5966 ns 5980 ns 112179 653.244MB/s to_wire_binary/32k 28940 ns 29412 ns 24929 1062.5MB/s to_wire_binary/256k 251626 ns 250358 ns 2804 998.569MB/s to_wire_binary/2M 2548630 ns 2540925 ns 264 787.115MB/s to_wire_binary/4M 6361041 ns 6407184 ns 112 624.299MB/s from_wire_binary/8 5363 ns 5284 ns 112179 1.44375MB/s from_wire_binary/64 5371 ns 5460 ns 100000 11.1785MB/s from_wire_binary/512 5386 ns 5460 ns 100000 89.4282MB/s from_wire_binary/4k 5483 ns 5424 ns 112179 720.244MB/s from_wire_binary/32k 7685 ns 7649 ns 89743 3.98998GB/s from_wire_binary/256k 25332 ns 25588 ns 28045 9.54136GB/s from_wire_binary/2M 620654 ns 625672 ns 1122 3.12164GB/s from_wire_binary/4M 1306333 ns 1306960 ns 561 2.98881GB/s
Dear Earnest,
Thanks for running my code :)
have you run my code from the post at 23.9.16 19:41 (make_array)?
In your result the XML settles at about 2.9 MB/sec (from wire) and the
text archive at about 3.8 MB/sec (from wire).
My guess from this values is, that you testes the version with
make_array in the serialization functions. I realized this DEBUG thing
too and i fixed it in the meantime. On my desktop workstation
The make_binary_object improved the performance a lot.
Here is the current version of the source from my homepage
Georg
<code>
// STL Archive + Stuff
#include
have you run my code from the post at 23.9.16 19:41 (make_array)? Yep
And here results for your latest code. Looks like it is doing quite well. Run on (8 X 3392 MHz CPU s) 09/25/16 11:50:14 Benchmark Time CPU Iterations ------------------------------------------------------------ to_wire_xml/8 30685 ns 30594 ns 22436 255.361kB/s to_wire_xml/64 33922 ns 34315 ns 21367 1.77868MB/s to_wire_xml/512 59209 ns 59797 ns 11218 8.16563MB/s to_wire_xml/4k 253341 ns 255922 ns 2804 15.2635MB/s to_wire_xml/32k 1831131 ns 1793594 ns 374 17.4231MB/s to_wire_xml/256k 14493601 ns 14352092 ns 50 17.4191MB/s to_wire_xml/2M 116950902 ns 117000750 ns 6 17.0939MB/s to_wire_xml/4M 237862928 ns 234001500 ns 3 17.0939MB/s from_wire_xml/8 38244 ns 38242 ns 17949 204.291kB/s from_wire_xml/64 42733 ns 42831 ns 16026 1.42503MB/s from_wire_xml/512 80459 ns 81703 ns 8974 5.97628MB/s from_wire_xml/4k 362877 ns 359818 ns 1951 10.8562MB/s from_wire_xml/32k 2637410 ns 2600017 ns 264 12.0192MB/s from_wire_xml/256k 20917089 ns 20962634 ns 32 11.926MB/s from_wire_xml/2M 166986547 ns 163801050 ns 4 12.2099MB/s from_wire_xml/4M 334816819 ns 335402150 ns 2 11.926MB/s to_wire_text/8 12291 ns 12412 ns 64102 629.454kB/s to_wire_text/64 15434 ns 15645 ns 44872 3.90136MB/s to_wire_text/512 39425 ns 39773 ns 17258 12.2767MB/s to_wire_text/4k 231714 ns 233668 ns 2804 16.7171MB/s to_wire_text/32k 1831865 ns 1835306 ns 408 17.0271MB/s to_wire_text/256k 14413837 ns 14213424 ns 45 17.589MB/s to_wire_text/2M 115013254 ns 114400733 ns 6 17.4824MB/s to_wire_text/4M 235687258 ns 234001500 ns 3 17.0939MB/s from_wire_text/8 11461 ns 11403 ns 56089 685.104kB/s from_wire_text/64 16117 ns 15992 ns 44872 3.81654MB/s from_wire_text/512 51763 ns 53040 ns 10000 9.20585MB/s from_wire_text/4k 333699 ns 336473 ns 2040 11.6094MB/s from_wire_text/32k 2607880 ns 2600017 ns 264 12.0192MB/s from_wire_text/256k 20869215 ns 20948706 ns 35 11.9339MB/s from_wire_text/2M 167760690 ns 167701075 ns 4 11.926MB/s from_wire_text/4M 334471631 ns 335402150 ns 2 11.926MB/s to_wire_binary/8 5625 ns 5616 ns 100000 1.3585MB/s to_wire_binary/64 5634 ns 5616 ns 100000 10.868MB/s to_wire_binary/512 5747 ns 5702 ns 112179 85.6388MB/s to_wire_binary/4k 6130 ns 6119 ns 112179 638.398MB/s to_wire_binary/32k 12251 ns 12273 ns 57200 2.4866GB/s to_wire_binary/256k 251085 ns 250358 ns 2804 998.569MB/s to_wire_binary/2M 2504876 ns 2507159 ns 280 797.716MB/s to_wire_binary/4M 6222666 ns 6267897 ns 112 638.173MB/s from_wire_binary/8 5194 ns 5304 ns 100000 1.43841MB/s from_wire_binary/64 5277 ns 5284 ns 112179 11.55MB/s from_wire_binary/512 5235 ns 5145 ns 112179 94.897MB/s from_wire_binary/4k 5354 ns 5424 ns 112179 720.244MB/s from_wire_binary/32k 7479 ns 7475 ns 89743 4.08277GB/s from_wire_binary/256k 24681 ns 24475 ns 28045 9.97506GB/s from_wire_binary/2M 631601 ns 626091 ns 897 3.11955GB/s from_wire_binary/4M 1309594 ns 1313034 ns 499 2.97498GB/s -----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Georg Gast Sent: Sunday, September 25, 2016 11:38 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [serialization] Runtime overhead of serialization archives Am 25.09.2016 um 07:05 schrieb Ernest Zaslavsky:
I guess you are still comparing release to debug version. I've ran your code and this is what I've got
Win7 x64, VS2015 Update3, Release, x64
Run on (8 X 3392 MHz CPU s) 09/25/16 07:36:53 Benchmark Time CPU Iterations ------------------------------------------------------------ to_wire_xml/8 39564 ns 39980 ns 17949 195.409kB/s to_wire_xml/64 98915 ns 98035 ns 7479 637.527kB/s to_wire_xml/512 583376 ns 583961 ns 1122 856.222kB/s to_wire_xml/4k 4494721 ns 4428415 ns 155 903.258kB/s to_wire_xml/32k 35621888 ns 35100225 ns 20 911.675kB/s to_wire_xml/256k 285296564 ns 280801800 ns 2 911.675kB/s to_wire_xml/2M 2294702350 ns 2293214700 ns 1 893.069kB/s to_wire_xml/4M 4596100456 ns 4586429400 ns 1 893.069kB/s from_wire_xml/8 44701 ns 44361 ns 15473 176.11kB/s from_wire_xml/64 97779 ns 98035 ns 7479 637.527kB/s from_wire_xml/512 523956 ns 530403 ns 1000 942.679kB/s from_wire_xml/4k 3919513 ns 3877482 ns 173 1031.6kB/s from_wire_xml/32k 30990532 ns 31200200 ns 22 1025.63kB/s from_wire_xml/256k 248254367 ns 249601600 ns 3 1025.63kB/s from_wire_xml/2M 1990579271 ns 1981212700 ns 1 1033.71kB/s from_wire_xml/8M 7927240207 ns 7924850800 ns 1 1033.71kB/s to_wire_text/8 13381 ns 13142 ns 49857 594.483kB/s to_wire_text/64 31969 ns 31985 ns 22436 1.90827MB/s to_wire_text/512 180335 ns 179751 ns 4079 2.71643MB/s to_wire_text/4k 1363654 ns 1375560 ns 499 2.83975MB/s to_wire_text/32k 10990438 ns 10968820 ns 64 2.84898MB/s to_wire_text/256k 86883137 ns 88400567 ns 9 2.82804MB/s to_wire_text/2M 696132001 ns 686404400 ns 1 2.91373MB/s to_wire_text/4M 1398212634 ns 1388408900 ns 1 2.881MB/s from_wire_text/8 11158 ns 11195 ns 64102 697.873kB/s from_wire_text/64 25274 ns 25588 ns 28045 2.38534MB/s from_wire_text/512 138245 ns 137666 ns 4986 3.54685MB/s from_wire_text/4k 1047166 ns 1046497 ns 641 3.73269MB/s from_wire_text/32k 8304279 ns 8320053 ns 90 3.75599MB/s from_wire_text/256k 66510527 ns 66654973 ns 11 3.75066MB/s from_wire_text/2M 533393808 ns 530403400 ns 1 3.77071MB/s from_wire_text/4M 1055956857 ns 1060806800 ns 1 3.77071MB/s to_wire_binary/8 5444 ns 5460 ns 100000 1.39732MB/s to_wire_binary/64 5411 ns 5424 ns 112179 11.2538MB/s to_wire_binary/512 5523 ns 5563 ns 112179 87.7797MB/s to_wire_binary/4k 5966 ns 5980 ns 112179 653.244MB/s to_wire_binary/32k 28940 ns 29412 ns 24929 1062.5MB/s to_wire_binary/256k 251626 ns 250358 ns 2804 998.569MB/s to_wire_binary/2M 2548630 ns 2540925 ns 264 787.115MB/s to_wire_binary/4M 6361041 ns 6407184 ns 112 624.299MB/s from_wire_binary/8 5363 ns 5284 ns 112179 1.44375MB/s from_wire_binary/64 5371 ns 5460 ns 100000 11.1785MB/s from_wire_binary/512 5386 ns 5460 ns 100000 89.4282MB/s from_wire_binary/4k 5483 ns 5424 ns 112179 720.244MB/s from_wire_binary/32k 7685 ns 7649 ns 89743 3.98998GB/s from_wire_binary/256k 25332 ns 25588 ns 28045 9.54136GB/s from_wire_binary/2M 620654 ns 625672 ns 1122 3.12164GB/s from_wire_binary/4M 1306333 ns 1306960 ns 561 2.98881GB/s
Dear Earnest,
Thanks for running my code :)
have you run my code from the post at 23.9.16 19:41 (make_array)?
In your result the XML settles at about 2.9 MB/sec (from wire) and the text archive at about 3.8 MB/sec (from wire).
My guess from this values is, that you testes the version with make_array in the serialization functions. I realized this DEBUG thing too and i fixed it in the meantime. On my desktop workstation
The make_binary_object improved the performance a lot.
Here is the current version of the source from my homepage
Georg
<code>
// STL Archive + Stuff
#include
Am 25.09.2016 um 10:56 schrieb Ernest Zaslavsky:
have you run my code from the post at 23.9.16 19:41 (make_array)? Yep
And here results for your latest code. Looks like it is doing quite well.
Dear Ernest, That result is in the same range as my current windows results. My main issue is, why is there such a big difference to the linux one? My XML Archives on linux settels at 50 MB/s. The text archives nearly at the same range too. See the added graphs. This is what i cant explain .... Georg
My XML Archives on linux settels at 50 MB/s. The text archives nearly at the same range too. Oh, now I see, windows is much slower in XML Well, looks like it is streams issue, streams never were known for great performance (at least on windows). That's why you cant use boost::lexical_cast if you are performance oriented. See the shotscreen attached. It all goes down to put/peek/ignore etc. I guess this issue is eligible for Microsoft Connect issue to be opened :)
-----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Georg Gast Sent: Sunday, September 25, 2016 12:07 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [serialization] Runtime overhead of serialization archives Am 25.09.2016 um 10:56 schrieb Ernest Zaslavsky:
have you run my code from the post at 23.9.16 19:41 (make_array)? Yep
And here results for your latest code. Looks like it is doing quite well.
Dear Ernest, That result is in the same range as my current windows results. My main issue is, why is there such a big difference to the linux one? My XML Archives on linux settels at 50 MB/s. The text archives nearly at the same range too. See the added graphs. This is what i cant explain .... Georg
Without global set locale: Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 78066 77177 7479 from_wire_xml 95638 95949 7479
With global set locale: 09/22/16 08:32:49 Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 41399 41302 16619 from_wire_xml 52841 52844 11218
Thats amazing! Thats the Level of the Linux implementation.
One riddle is solved :)
After fiddling around on linux with clang 3.8 and gcc optimizer options i got down to this. With gcc and -O3.
Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 11174 11178 381818 to_wire_text 5148 5149 820313 to_wire_binary 3327 3330 1141304 to_wire_cstyle 63 63 65217391 from_wire_xml 27170 27183 155096 from_wire_text 5371 5370 783582 from_wire_binary 3226 3228 1296296 from_wire_cstyle 45 45 93750000
This results look very nice. <6µs for serilize/deserialize a structure to a portable text archive seems very nice :)
Now is the difference again pretty big compared to windows .....
Now i installed msys2 on Win10 and compiled boost/benchmark and the testsuite with gcc 5.3.0 and i got these results Run on (8 X 2195 MHz CPU s) 2016-09-23 00:00:29 ***WARNING*** Library was built as DEBUG. Timings may be affected. Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 17304 17274 40698 to_wire_text 10769 10917 74468 to_wire_binary 7244 7310 89744 to_wire_cstyle 168 165 4069767 from_wire_xml 75807 75725 8861 from_wire_text 10217 10273 74468 from_wire_binary 7254 7308 111111 from_wire_cstyle 166 165 4069767 The compiler or its support libraries are definitly an issue.
Run on (8 X 2195 MHz CPU s) 2016-09-23 00:00:29 ***WARNING*** Library was built as DEBUG. Timings may be affected.
Looks like you are comparing release version with debug hence the gap -----Original Message----- From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On Behalf Of Georg Gast Sent: Friday, September 23, 2016 1:04 AM To: boost-users@lists.boost.org Subject: Re: [Boost-users] [serialization] Runtime overhead of serialization archives
Without global set locale: Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 78066 77177 7479 from_wire_xml 95638 95949 7479
With global set locale: 09/22/16 08:32:49 Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 41399 41302 16619 from_wire_xml 52841 52844 11218
Thats amazing! Thats the Level of the Linux implementation.
One riddle is solved :)
After fiddling around on linux with clang 3.8 and gcc optimizer options i got down to this. With gcc and -O3.
Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 11174 11178 381818 to_wire_text 5148 5149 820313 to_wire_binary 3327 3330 1141304 to_wire_cstyle 63 63 65217391 from_wire_xml 27170 27183 155096 from_wire_text 5371 5370 783582 from_wire_binary 3226 3228 1296296 from_wire_cstyle 45 45 93750000
This results look very nice. <6µs for serilize/deserialize a structure to a portable text archive seems very nice :)
Now is the difference again pretty big compared to windows .....
Now i installed msys2 on Win10 and compiled boost/benchmark and the testsuite with gcc 5.3.0 and i got these results Run on (8 X 2195 MHz CPU s) 2016-09-23 00:00:29 ***WARNING*** Library was built as DEBUG. Timings may be affected. Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 17304 17274 40698 to_wire_text 10769 10917 74468 to_wire_binary 7244 7310 89744 to_wire_cstyle 168 165 4069767 from_wire_xml 75807 75725 8861 from_wire_text 10217 10273 74468 from_wire_binary 7254 7308 111111 from_wire_cstyle 166 165 4069767 The compiler or its support libraries are definitly an issue. _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
On 09/22/2016 06:25 AM, Georg Gast wrote:
Could you please elaborate what is different in your archive?
The Boost.Serialization archives use iostreams as a generic mechanism for inputting and outputting data. This comes with the added performance cost as you have discovered. My output archives use a buffer interface class along with buffer traits to determine how to write data to a given container. These buffer traits are described at: http://breese.github.io/trial/protocol/trial_protocol/buffer.html There are specializations for the most common standard container types, so you can pass a std::string (or std::vector or std::ostream) directly in the constructor. My input archives simply take a string view as input. I have not had the need for anything else.
participants (6)
-
Bjorn Reese
-
Chris Glover
-
Ernest Zaslavsky
-
Georg Gast
-
georg@schorsch-tech.de
-
Oswin Krause