On 03/05/2016 17:35, Cooper, Bridgette R D wrote:
Hi John,
Thanks for the response.
Here are some more snippets from my code.
I don't see any obvious errors, but in any case you must have extracted that from something else but it doesn't compile! Not least you can't implicitly return from a float128 to a double in return(bcoef.at(0)) So if that's compiling for you, then I suspect your declaration of bcoef. If I re-write and simplify in a slightly more C++ idiom how does it compare to your fortran code now: double stieltjes_gamma(const std::vector<double> & e8, const std::vector<double> & g8, int printlevel) { assert(e8.size() == g8.size()); // precondition? std::vector<float128> epoint(e8.begin(), e8.end()), gpoint(g8.begin(), g8.end()), bcoef(e8.size()); bcoef[0] = 0.0Q; for(int i = 0; i < e8.size(); i++) { bcoef[0] += gpoint[i]; } std::cout << std::setprecision(std::numeric_limits<float128>::max_digits10) << bcoef.at(0) << std::endl; return static_cast<double>(bcoef[0]); } And finally... note that summing at higher precision can not actually protect you from cancellation error (in any language) since the error is inherent in the input values. HTH, John.
#include
#include #include #include #include #include #include extern "C"{ #include } #include <iostream> #include <sstream> #include <vector> #include <iterator> using namespace boost::multi precision;
double stieltjes_gamma(int num, const std::vector<double> & e8, const std::vector<double> & g8, int printlevel ) {
std::vector<float128> epoint(num), gpoint(num), bcoef(num);
for (int i=0; i < num; i++) { epoint.at(i)=(float128(e8.at(i))); gpoint.at(i)=(float128(g8.at(i))); }
bcoef.at(0)=0.0Q; for (int i = 0; i < num; i++) { bcoef.at(0)+=gpoint.at(i) } std::cout << std::setprecision(std::numeric_limits<float128>::max_digits10) << bcoef.at(0)<< std::endl; return(bcoef.at(0)) }
When I compare this to fortran code that does the same I don't see the same value for bcoef.at(0).
0.0345217724606016853085032291117978629 <- from above c++ code 3.452175962187407162907246287483402E-0002 <- Fortran correctly utilising quad precision on same input vector
The input vectors in this case are 7976 so errors in precision are really quickly apparrant. If I print the gpoint vector, it gets filled with the same garbage for additional precision from double as is for the fortran.
Thanks, ________________________________________ From: Boost-users
on behalf of John Maddock Sent: Tuesday, May 3, 2016 5:06:47 PM To: boost-users@lists.boost.org Subject: Re: [Boost-users] converting to float128 from double On 03/05/2016 13:44, Cooper, Bridgette R D wrote:
Hi,
I have a function that takes as an argument a vector of doubles and tries to convert it to a vector of float128.
When I accumulate the values in the float128 vector it looks like it does double additions and only casts up at the last step.
The initial vector is defined as conststd::vector<double>
The float128 vector is defined as std::vector<float128> epoint(num)
And I try to do the casting (inside a loop) via: epoint.at(i)=(float128(e8.at(i)));
If I do the accumulation on the original vector with no casting into a float128, I get the same as accumulating the vector that should be of float128 type.
How should I be doing the casting?
Sorry but you'll have to post way more code than that to see where the mistake is.
John.
Thanks,
_______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users
Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users _______________________________________________ Boost-users mailing list Boost-users@lists.boost.org http://lists.boost.org/mailman/listinfo.cgi/boost-users