Re: [Boost-users] converting to float128 from double

3 May 2016

      Hi John,

No, my code is several hundreds of lines long. I extracted what I thought would be quite self contained, but yeah you are right about the return type mis-match.

I am not sure that this helps. What I really want is to do a lot of computation in float128 precision, and for some reason when I cast up a double vector to a float128 vector  all the operations on the float128 vector are happening a double precision, and the cast up only happens at the final stage.

I've changed the initiation of epoint and gpoint vectors to the same as your code snippet and still the same problem.

So for example, if I accumulate the original double vector without casting the values to float128 type and accumulate into a float128 variable, the cast up only happens to the result. This is the same number that I see as the result of accumulating the float128vector values.

Hope this clarifies things a bit
________________________________________
From: Boost-users <boost-users-bounces@lists.boost.org> on behalf of John Maddock <jz.maddock@googlemail.com>
Sent: Tuesday, May 3, 2016 6:27:39 PM
To: boost-users@lists.boost.org
Subject: Re: [Boost-users] converting to float128 from double

On 03/05/2016 17:35, Cooper, Bridgette R D wrote:
...
Hi John,
Thanks for the response.
Here are some more snippets from my code.
I don't see any obvious errors, but in any case you must have extracted
that from something else but it doesn't compile!

Not least you can't implicitly return from a float128 to a double in

     return(bcoef.at(0))

So if that's compiling for you, then I suspect your declaration of bcoef.

If I re-write and simplify in a slightly more C++ idiom how does it
compare to your fortran code now:

double stieltjes_gamma(const std::vector<double> & e8, const
std::vector<double> & g8, int printlevel)
{
    assert(e8.size() == g8.size());  // precondition?
    std::vector<float128> epoint(e8.begin(), e8.end()),
gpoint(g8.begin(), g8.end()), bcoef(e8.size());

    bcoef[0] = 0.0Q;
    for(int i = 0; i < e8.size(); i++)
    {
       bcoef[0] += gpoint[i];
    }
    std::cout <<
std::setprecision(std::numeric_limits<float128>::max_digits10) <<
bcoef.at(0) << std::endl;
    return static_cast<double>(bcoef[0]);
}

And finally... note that summing at higher precision can not actually
protect you from cancellation error (in any language) since the error is
inherent in the input values.

HTH, John.
...
#include <boost/multiprecision/float128.hpp>
#include <boost/math/cstdfloat/cstdfloat_types.hpp>
#include <boost/math/cstdfloat/cstdfloat_limits.hpp>
#include <boost/math/cstdfloat/cstdfloat_cmath.hpp>
#include <boost/math/cstdfloat/cstdfloat_iostream.hpp>
#include <boost/multiprecision/detail/float_string_cvt.hpp>
#include <boost/multiprecision/detail/generic_interconvert.hpp>
extern "C"{
#include <quadmath.h>
}
#include <iostream>
#include <sstream>
#include <vector>
#include <iterator>
using namespace boost::multi precision;
double stieltjes_gamma(int num, const std::vector<double> & e8, const std::vector<double> & g8, int printlevel )
{
std::vector<float128> epoint(num), gpoint(num), bcoef(num);
for (int i=0; i < num; i++)
                 {
                 epoint.at(i)=(float128(e8.at(i)));
                 gpoint.at(i)=(float128(g8.at(i)));
                 }
bcoef.at(0)=0.0Q;
         for (int i = 0; i < num; i++)
                 {
                 bcoef.at(0)+=gpoint.at(i)
                 }
        std::cout << std::setprecision(std::numeric_limits<float128>::max_digits10)  << bcoef.at(0)<< std::endl;
     return(bcoef.at(0))
}
When I compare this to fortran code that does the same I don't see the same value for bcoef.at(0).
0.0345217724606016853085032291117978629  <- from above c++ code
3.452175962187407162907246287483402E-0002   <- Fortran correctly utilising quad precision on same input vector
The input vectors in this case are 7976 so errors in precision are really quickly apparrant. If I print  the gpoint vector, it gets filled with the same garbage for additional precision from double  as is for the fortran.
Thanks,
________________________________________
From: Boost-users <boost-users-bounces@lists.boost.org> on behalf of John Maddock <jz.maddock@googlemail.com>
Sent: Tuesday, May 3, 2016 5:06:47 PM
To: boost-users@lists.boost.org
Subject: Re: [Boost-users] converting to float128 from double
On 03/05/2016 13:44, Cooper, Bridgette R D wrote:
...
Hi,
I have a function that takes as an argument a vector of doubles and
tries to convert it to a vector of float128.
When I accumulate the values in the float128 vector it looks like it
does double additions and only casts up at the last step.
The initial vector is defined as conststd::vector<double>
The float128 vector is defined as std::vector<float128> epoint(num)
And I try to do the casting (inside a loop) via:
epoint.at(i)=(float128(e8.at(i)));
If I do the accumulation on the original vector with no casting into a
float128, I get the same as accumulating the vector that should be of
float128 type.
How should I be doing the casting?
Sorry but you'll have to post way more code than that to see where the
mistake is.
John.
...
Thanks,
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users

Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users
_______________________________________________
Boost-users mailing list
Boost-users@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/boost-users