On Mon, Jul 10, 2017 at 9:57 AM, Phil Endecott via Boost
So you're reinterpret_casting a char* to a size_t* and then dereferencing it. Isn't that undefined behaviour?
Which platform do you think this doesn't work on? I have to apologize again for the earlier misleading comments. The benchmarks were measuring the wrong thing. The actual figure is more in line with 20,000%. The reinterpret_cast<> can be trivially changed to std::memcpy: std::size_t temp; std::memcpy(&temp, in, sizeof(temp)); if((temp & mask) != 0) This hurts MSVC a bit, gcc/clang/MIPS not at all, but ARM takes quite a hit: https://godbolt.org/g/HFxxub (Thanks to Peter for working up the CE ARM example) Updated benchmarks (run on MSVC with optimizations) With reinterpret_cast beast.benchmarks.utf8_checker beast: 1,977,629,044 char/s beast: 1,474,907,121 char/s beast: 1,930,979,173 char/s beast: 1,899,078,149 char/s beast: 1,893,740,588 char/s locale: 81,966,125 char/s locale: 82,200,658 char/s locale: 81,802,070 char/s locale: 82,103,369 char/s locale: 81,802,149 char/s Longest suite times: 2.7s beast.benchmarks.utf8_checker 2.7s, 1 suite, 1 case, 1 test total, 0 failures The program '[79364] benchmarks.exe' has exited with code 0 (0x0). With std::memcpy beast.benchmarks.utf8_checker beast: 1,124,515,969 char/s beast: 1,336,074,093 char/s beast: 1,494,183,562 char/s beast: 1,506,365,044 char/s beast: 1,533,419,187 char/s locale: 75,457,683 char/s locale: 81,358,140 char/s locale: 80,413,657 char/s locale: 81,635,114 char/s locale: 67,234,619 char/s Longest suite times: 3.0s beast.benchmarks.utf8_checker 3.0s, 1 suite, 1 case, 1 test total, 0 failures The program '[82220] benchmarks.exe' has exited with code 0 (0x0).