Hi John,
No, my code is several hundreds of lines long. I extracted what I thought would be quite self contained, but yeah you are right about the return type mis-match.
I am not sure that this helps. What I really want is to do a lot of computation in float128 precision, and for some reason when I cast up a double vector to a float128 vector all the operations on the float128 vector are happening a double precision, and the cast up only happens at the final stage.
I've changed the initiation of epoint and gpoint vectors to the same as your code snippet and still the same problem.
So for example, if I accumulate the original double vector without casting the values to float128 type and accumulate into a float128 variable, the cast up only happens to the result. This is the same number that I see as the result of accumulating the float128vector values.
Hope this clarifies things a bit
________________________________________
From: Boost-users
Hi John,
Thanks for the response.
Here are some more snippets from my code.
I don't see any obvious errors, but in any case you must have extracted that from something else but it doesn't compile! Not least you can't implicitly return from a float128 to a double in return(bcoef.at(0)) So if that's compiling for you, then I suspect your declaration of bcoef. If I re-write and simplify in a slightly more C++ idiom how does it compare to your fortran code now: double stieltjes_gamma(const std::vector<double> & e8, const std::vector<double> & g8, int printlevel) { assert(e8.size() == g8.size()); // precondition? std::vector<float128> epoint(e8.begin(), e8.end()), gpoint(g8.begin(), g8.end()), bcoef(e8.size()); bcoef[0] = 0.0Q; for(int i = 0; i < e8.size(); i++) { bcoef[0] += gpoint[i]; } std::cout << std::setprecision(std::numeric_limits<float128>::max_digits10) << bcoef.at(0) << std::endl; return static_cast<double>(bcoef[0]); } And finally... note that summing at higher precision can not actually protect you from cancellation error (in any language) since the error is inherent in the input values. HTH, John.
#include
#include #include #include #include #include #include extern "C"{ #include } #include <iostream> #include <sstream> #include <vector> #include <iterator> using namespace boost::multi precision;
double stieltjes_gamma(int num, const std::vector<double> & e8, const std::vector<double> & g8, int printlevel ) {
std::vector<float128> epoint(num), gpoint(num), bcoef(num);
for (int i=0; i < num; i++) { epoint.at(i)=(float128(e8.at(i))); gpoint.at(i)=(float128(g8.at(i))); }
bcoef.at(0)=0.0Q; for (int i = 0; i < num; i++) { bcoef.at(0)+=gpoint.at(i) } std::cout << std::setprecision(std::numeric_limits<float128>::max_digits10) << bcoef.at(0)<< std::endl; return(bcoef.at(0)) }
When I compare this to fortran code that does the same I don't see the same value for bcoef.at(0).
0.0345217724606016853085032291117978629 <- from above c++ code 3.452175962187407162907246287483402E-0002 <- Fortran correctly utilising quad precision on same input vector
The input vectors in this case are 7976 so errors in precision are really quickly apparrant. If I print the gpoint vector, it gets filled with the same garbage for additional precision from double as is for the fortran.
Thanks, ________________________________________ From: Boost-users
on behalf of John Maddock Sent: Tuesday, May 3, 2016 5:06:47 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] converting to float128 from double On 03/05/2016 13:44, Cooper, Bridgette R D wrote:
Hi,
I have a function that takes as an argument a vector of doubles and tries to convert it to a vector of float128.
When I accumulate the values in the float128 vector it looks like it does double additions and only casts up at the last step.
The initial vector is defined as conststd::vector<double>
The float128 vector is defined as std::vector<float128> epoint(num)
And I try to do the casting (inside a loop) via: epoint.at(i)=(float128(e8.at(i)));
If I do the accumulation on the original vector with no casting into a float128, I get the same as accumulating the vector that should be of float128 type.
How should I be doing the casting?
Sorry but you'll have to post way more code than that to see where the mistake is.
John.
Thanks,
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users