Re: [Boost-users] lexical_cast between double and string slow in Visual Studio 2013

28 Mar 2014


      how to convert a hex string to int?


2014-03-28 1:44 GMT+08:00 Paul A. Bristow <pbristow@hetp.u-net.com>:
...


...
-----Original Message-----
From: Boost-users [mailto:boost-users-bounces@lists.boost.org] On
Behalf Of
David
Roberts
Sent: 27 March 2014 15:58
To: boost-users@lists.boost.org
Subject: Re: [Boost-users] lexical_cast between double and string slow in
Visual
Studio 2013
...
That issue is unknown. I'd really appreciate the investigation.
I have done some more investigation, and there are two factors that only
cause
the
slowness when they both occur together.
...
Try excluding the lexical_cast from test, I have a feeling that this
is only
MSVC
related issue:
#include <sstream>
#include <string>
int main (int, char **)
{
    for (double count = 0.0; count < 1000000.0; count += 1.41)
    {
        std::stringstream ss;
        ss << count;
        std::string result = std::move(ss.str());
        ss.str(std::string());
ss << result;
        ss >> count;
    }
return 0;
}
Running your test program does not exhibit the problem.  It runs in
around 3
seconds
on my machine when built with either Visual Studio 2010 or Visual Studio
...
However, changing it very slightly to match more closely what
...
internally does recreate the problem:
#include <sstream>
#include <string>
int main (int, char **)
{
    for (double count = 0.0; count < 1000000.0; count += 1.41)
    {
        std::stringstream ss;
        ss.unsetf(std::ios::skipws);
        ss.precision(17);
ss << count;
        std::string result = std::move(ss.str());
        ss.str(std::string());
ss << result;
        ss >> count;
    }
    return 0;
}
The effect of setting the precision to 17 is that lots of 9s appear in
...
representations.  (The number 17 is what
boost::detail::lcast_get_precision(double*)
chooses.)  Without the precision call the contents of the string called
result
start off
like this:
0
1.41
2.82
4.23
5.64
7.05
8.46
9.87
11.28
12.69
With precision set to 17 they start off like this:
0
1.4099999999999999
2.8199999999999998
4.2299999999999995
5.6399999999999997
7.0499999999999998
8.4599999999999991
9.8699999999999992
11.279999999999999
12.69
This happens for both Visual Studio 2010 and Visual Studio 2013.
Then the next difference is that Visual Studio 2013 spends a lot longer
handling all
the extra 9s.  Changing the program so that the double is converted to a
string using
std::stringstream without a precision call and then back to double using
lexical_cast
takes about 3 seconds for both Visual Studio 2010 and Visual Studio
...
combination of having all the extra 9s to parse and using Visual Studio
2013
...
makes the test using lexical_cast to go both ways slow.
Both Visual Studio 2010 and Visual Studio 2013 do the conversion by
calling
std::num_get<char,std::istreambuf_iterator<char,std::char_traits<char> >
...
::do_get() which then calls a function called _Stodx() which is
implemented
in
xstod.c.  This function is very different for the two versions.  In
Visual
Studio 2010 it's
a relatively thin wrapper around the C function strtod().  In Visual
Studio
2013
_Stodx() has got a completely new implementation that's generated by
#including
xxstod.h with some macros defined.
The original C function strtod() is much faster than the new _Stodx()
when
...
lots of 9s at the end of the strings being parsed.  This modification to
lexical_cast
does
the
string
2013.  It
is the
that
there are
the
program:
...
#include <sstream>
#include <string>
#include <stdlib.h>
int main (int, char **)
{
    for (double count = 0.0; count < 1000000.0; count += 1.41)
    {
        std::stringstream ss;
        ss.unsetf(std::ios::skipws);
        ss.precision(17);
ss << count;
        std::string result = std::move(ss.str());
        ss.str(std::string());
ss << result;
        char *endptr;
        count = strtod(ss.str().c_str(), &endptr);
    }
    return 0;
}
has a runtime of about 3 seconds even though it's got to cope with all
the 9s.
...
I guess only someone from Microsoft or Dinkumware could comment on why
_Stodx() was reimplemented.
But the other thing is that by setting precision to 17 lexical_cast is
...
representations of the doubles with lots of 9s in both Visual Studio
2010 and
Visual
Studio 2013.  Setting precision to 15 instead prevents this, and makes
...
test run faster even with Visual Studio 2013 (about 4 seconds rather
bloating the string
the
original
than 10).
In order to be sure of 'round-tripping' one needs to output
std::numeric_limits<FPT>::max_digits10 decimal digits.
max_digits10 is 17 for double
enough to ensure that all *possibly* significant digits are used.
digits10 is 15 for double  and using this will work for *your* example,
but will
fail to 'round-trip' exactly for some values of double.
The reason for a rewrite *might* be that for VS <=11, there was a slight
'feature'
('feature' according to Microsoft, 'bug' according to many, though the C++
Standard does NOT require round-tripping to be exact.  Recent GCC and Clang
achieve exact round-tripping.)
// The original value causing trouble using serialization was
0.00019075645054089487;
// wrote 0.0019075645054089487
// read  0.0019075645054089489
// a increase of just 1 bit.
// Although this test uses a std::stringstream, it is possible that
// the same behaviour will be found with ALL streams, including cout and
cin?
// The wrong inputs are only found in a very narrow range of values:
// approximately 0.0001 to 0.004, with exponent values of 3f2 to 3f6
// and probably every third value of significand (tested using nextafter).
However, a re-test reveals that this 'feature' is still present using
VS2013
(version 12.0).
(This tests uses random double values to find round-trip or loopback
failures).
...
Description: Autorun "J:\Cpp\Misc\Debug\loopback.exe"
1>
1>  failed 78, out of 100000, fraction 0.00077999999999999999
1>
1>  wrong min 5.2173006024157652e-310 == 600ac32350ee
1>  wrong max 8.7621968418217147e-308 == 2f80e435eb2ef3
1>
1>  test min 1.2417072250589532e-311 == 24928faf2f7
1>  test max 1.7898906514522990e+308 == 7fefdc71c85a1145
1>  186a0 loopback tests done.
1>FinalizeBuildStatus:
1>  Deleting file "Debug\loopback.tlog\unsuccessfulbuild".
1>  Touching "Debug\loopback.tlog\loopback.lastbuildstate".
1>
1>Build succeeded.
But this time it only occurs for a *different* and much smaller range :-(
1>  Description: Autorun "J:\Cpp\Misc\Debug\loopback.exe"
1>
1>  Written  : 2.0367658404750995e-308 == ea55b0142dc71
1>  Readback : 2.0367658404751000e-308 == ea55b0142dc72
1>  Written  : 7.2650939912298312e-308 == 2a1eee018d6993
1>  Readback : 7.2650939912298322e-308 == 2a1eee018d6994
1>  Written  : 1.0124608169366832e-308 == 747c6af50194c
1>  Readback : 1.0124608169366827e-308 == 747c6af50194b
...
1>  failed 77, out of 100000, fraction 0.00076999999999999996
1>
1>  wrong min 5.4632820247365795e-310 == 6491f5f0ab91
1>  wrong max 8.7543773312713900e-308 == 2f79b1b891b2c1
1>
1>  test min 2.1782631694667282e-310 == 2819299bf337
1>  test max 1.7974889513081573e+308 == 7fefff11cdbbcb43
1>  186a0 loopback tests done.
1>
I've retested using VS 2013 and the failures are now in the narrow range
very
near to numeric_limits<double>::min()
Much better, but still not quite right :-(
1>  Readback : 6.1131075857298205e-308 == 25fa9ea293ff26
1>  failed 3680, out of 10000000, fraction 0.00036800000000000000
1>
1>  wrong min 4.4505959275765217e-308 == 2000699c514815
1>  wrong max 8.8998755028746106e-308 == 2fff9d0d8336f1
1>
1>  test min 8.9025924527339071e-313 == 29f4307bd7
1>  test max 1.7976312864655923e+308 == 7fefffb7d9534507
1>  98bf7a loopback tests done.
To work around this 'feature' it was only necessary to use std::scientific
format (but of course this means more characters to digest).
(But with VS2013 the results are as 'wrong' as not using std::scientific,
so go
figure ???).
This whole process is a minefield and you can find more than you wanted to
know
from Rich Regan's work, starting (but not ending) with
http://www.exploringbinary.com/incorrect-round-trip-conversions-in-visual-c-...
-plus/
For me, the bottom line is that, for C++ the whole IO needs to be
rewritten *in
C++*, perhaps using Fusion.
This might be an exercise for a student ;-)
Boost must be portable, so I'm not sure about your 'improvement' to speed,
but
if speed on MSVC matters to you, then use it. Equally, the tiny risk of a
small
loss of accuracy may not matter to you either, so using just 15 decimal
digits
may be acceptable.
IMO, exact round-tipping is essential (especially for serialization) ,
speed is
just nice.
HTH (though I fear not).
Paul
---
Paul A. Bristow
Prizet Farmhouse
Kendal UK LA8 8AB
+44 01539 561830 07714330204
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Re: [Boost-users] lexical_cast between double and string slow in Visual Studio 2013

shada