Without global set locale: Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 78066 77177 7479 from_wire_xml 95638 95949 7479
With global set locale: 09/22/16 08:32:49 Benchmark Time(ns) CPU(ns) Iterations ---------------------------------------------- to_wire_xml 41399 41302 16619 from_wire_xml 52841 52844 11218
Thats amazing! Thats the Level of the Linux implementation.
One riddle is solved :)
After fiddling around on linux with clang 3.8 and gcc optimizer options i got down to this. With gcc and -O3.
Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 11174 11178 381818 to_wire_text 5148 5149 820313 to_wire_binary 3327 3330 1141304 to_wire_cstyle 63 63 65217391 from_wire_xml 27170 27183 155096 from_wire_text 5371 5370 783582 from_wire_binary 3226 3228 1296296 from_wire_cstyle 45 45 93750000
This results look very nice. <6µs for serilize/deserialize a structure to a portable text archive seems very nice :)
Now is the difference again pretty big compared to windows .....
Now i installed msys2 on Win10 and compiled boost/benchmark and the testsuite with gcc 5.3.0 and i got these results Run on (8 X 2195 MHz CPU s) 2016-09-23 00:00:29 ***WARNING*** Library was built as DEBUG. Timings may be affected. Benchmark Time(ns) CPU(ns) Iterations ------------------------------------------------- to_wire_xml 17304 17274 40698 to_wire_text 10769 10917 74468 to_wire_binary 7244 7310 89744 to_wire_cstyle 168 165 4069767 from_wire_xml 75807 75725 8861 from_wire_text 10217 10273 74468 from_wire_binary 7254 7308 111111 from_wire_cstyle 166 165 4069767 The compiler or its support libraries are definitly an issue.