[multiprecission] some thoughts on add_unsigned.hpp
Hi,
In my program I see a lot of time taken up by add/subtract_unsigned. I have a couple of thoughts on the code:
Carry propagation outside the range of the smallest number can be improved:
for (; i < x && carry; ++i)
carry = ::boost::multiprecision::detail::addcarry_limb(carry, pa[i], 0, pr + i);
Maybe this:
for (; i < x && carry; ++i)
carry = ::boost::multiprecision::detail::addcarry_limb(0, pa[i], 1, pr + i);
This generates just an inc on my machine but unfortunately it doesn't eliminate the loads and stores. I think on average the carry chain must be small though.
You have the same situation on subtract.
add_unsigned and subtract_unsigned are inconsistent at the tail end after carry propagation:
add_unsigned:
if (i == x && carry)
{
// We overflowed, need to add one more limb:
result.resize(x + 1, x + 1);
if (result.size() > x)
result.limbs()[x] = static_cast
On 19/08/2021 02:43, Neill Clift via Boost-users wrote:
Hi,
In my program I see a lot of time taken up by add/subtract_unsigned. I have a couple of thoughts on the code:
Carry propagation outside the range of the smallest number can be improved:
for (; i < x && carry; ++i)
carry = ::boost::multiprecision::detail::addcarry_limb(carry, pa[i], 0, pr + i);
Maybe this:
for (; i < x && carry; ++i)
carry = ::boost::multiprecision::detail::addcarry_limb(0, pa[i], 1, pr + i);
This generates just an inc on my machine but unfortunately it doesn’t eliminate the loads and stores. I think on average the carry chain must be small though.
You have the same situation on subtract.
add_unsigned and subtract_unsigned are inconsistent at the tail end after carry propagation:
add_unsigned:
if (i == x && carry)
{
// We overflowed, need to add one more limb:
result.resize(x + 1, x + 1);
if (result.size() > x)
result.limbs()[x] = static_cast
(1u); }
else
std::copy(pa + i, pa + x, pr + i); ß- if I == x we do the copy. We also do the copy even if the output and input overlap
subtract_unsigned:
// Any remaining digits are the same as those in pa:
if ((x != i) && (pa != pr)) ß------ we do these checks here and avoid the copy
std_constexpr::copy(pa + i, pa + x, pr + i);
I am seeing a lot of time being used by that general copy above in add_unsigned when it can be avoided.
This stuff is totally awesome and works well in my program. So thanks for working on this stuff.
Thanks Neill, it might be a few days at best before I can look at this in detail, I wonder could you either file a bug report or a PR at https://github.com/boostorg/multiprecision/ and then this won't get lost. Thanks!! John. -- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus
participants (2)
-
John Maddock
-
Neill Clift