[math] Efficient polynomial multiplication
Hello all This is my first post to the mailing list. While looking at the source code for polynomial multiplication (boost/math/tools/polynomial.hpp) I discovered that the library used O(N^2) algorithm for multiplication. I would like to work on implementing a more efficient algorithm (complexity: O(N log(N))). I am writing this mail to get views and feedback from the community and have a healthy discussion before starting the work. Lakshay
Hello. At first please give us bencmark results, where NlogN is faster and where is slower. After that please write email to Boost.Math maintainer or create issue on Github Boost.Math repo. 13.07.2017 08:00, Lakshay Garg via Boost пишет:
Hello all
This is my first post to the mailing list.
While looking at the source code for polynomial multiplication (boost/math/tools/polynomial.hpp) I discovered that the library used O(N^2) algorithm for multiplication. I would like to work on implementing a more efficient algorithm (complexity: O(N log(N))).
I am writing this mail to get views and feedback from the community and have a healthy discussion before starting the work.
Lakshay
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 13/07/2017 06:00, Lakshay Garg via Boost wrote:
Hello all
This is my first post to the mailing list.
While looking at the source code for polynomial multiplication (boost/math/tools/polynomial.hpp) I discovered that the library used O(N^2) algorithm for multiplication. I would like to work on implementing a more efficient algorithm (complexity: O(N log(N))).
I am writing this mail to get views and feedback from the community and have a healthy discussion before starting the work.
There has been some work towards this here: https://github.com/boostorg/math/pull/32, but it's a long way from ready to go. The Karatsuba algorithm is probably/possibly of more interest than FFT based ones too: as FFT's only really kick in when the digit count is exceptionally large. So I would say if you're looking for something to do, then a PR for the Karatsuba algorithm would be the best choice: most of the work would be in up-rating the tests and doing performance testing to see what order of polynomial needs the more complex algorithm. Best, John Maddock. --- This email has been checked for viruses by AVG. http://www.avg.com
[...] then a PR for the Karatsuba algorithm would be the best choice: most of the work would be in up-rating the tests and doing performance testing to see what order of polynomial needs the more complex algorithm.
I have been doing some experiments with the Karatsuba multiplication algorithm. Karatsuba multiplication is outperforming the regular multiplication method for high degree polynomials (> 500). I believe it can still be improved since I haven't spent much time trying to optimize it. But before that I would like to share some bench-marking result and have a discussion. Here are some benchmark results (using google-benchmark): | Polynomial | TIME | | Size | Boost Polynomial | Naive array impl | Karatsuba array impl | |------------+------------------+------------------+----------------------| | 1 | 167 ns | 47 ns | 60 ns | | 2 | 163 ns | 59 ns | 93 ns | | 4 | 200 ns | 87 ns | 208 ns | | 8 | 231 ns | 134 ns | 264 ns | | 16 | 324 ns | 243 ns | 495 ns | | 32 | 601 ns | 534 ns | 1005 ns | | 64 | 1479 ns | 1873 ns | 2626 ns | | 128 | 5192 ns | 5280 ns | 7031 ns | | 256 | 18512 ns | 18215 ns | 18221 ns | | 512 | 79322 ns | 66010 ns | 54160 ns | | 1024 | 267440 ns | 247470 ns | 152651 ns | | 2048 | 1158600 ns | 968172 ns | 433985 ns | |------------|------------------|------------------|----------------------| | Complexity | 0.34 N^2 | 0.23 N^2 | 2.47 N^1.585 | | BigO RMS | 15 % | 2 % | 5 % | Boost Polynomial -> Polynomial class in the boost.math library Naive array impl -> O(N^2) polynomial multiplication on plain C arrays Karatsuba array -> Karatsuba polynomial multiplication using plain C arrays Looking at the data, I'm wondering why is boost's implementation so slow for polynomials of lower degree? I could not see any obvious reason by looking at the source. -- Lakshay
Looking at the data, I'm wondering why is boost's implementation so slow for polynomials of lower degree? I could not see any obvious reason by looking at the source.
I would guess memory allocation: if you're multiplying into already allocated arrays and don't have to dynamically allocate memory for the result of the multiplication, then that should be a lot quicker. I can't remember if the polynomial class operators are move-optimised either (they should be). John. --- This email has been checked for viruses by AVG. http://www.avg.com
I would guess memory allocation: if you're multiplying into already allocated arrays and don't have to dynamically allocate memory for the result of the multiplication, then that should be a lot quicker.
No, I do not pre-allocate arrays. Here is the code for O(N^2) algorithm
I'm using:
double * g(double * p, double * q, size_t len) {
double * out = new double[2*len];
clear(out, 2*len); // calls memset
for (int i=0; i either (they should be). I don't think they are move optimized. All the function take lvalue
references. Neither does this have a move constructor. I guess this class
needs a lot of work.
--
Lakshay
BTW: A polynomial class working with std::array / boost::array would be
great! No allocation at all...
polynomial
Looking at the data, I'm wondering why is boost's implementation so slow for polynomials of lower degree? I could not see any obvious reason by looking at the source.
I would guess memory allocation: if you're multiplying into already allocated arrays and don't have to dynamically allocate memory for the result of the multiplication, then that should be a lot quicker. I can't remember if the polynomial class operators are move-optimised either (they should be).
John.
--- This email has been checked for viruses by AVG. http://www.avg.com
_______________________________________________ Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
BTW: A polynomial class working with std::array / boost::array would be great! No allocation at all...
Maybe I did not understand this properly but I disagree that using std::array/boost::array will significantly reduce memory allocations. In my opinion, it is the implementation which determines the number of memory allocations. You can have the same number of allocations with vector. Also, you would need to know the sizes of vectors at the compile time if arrays are to be used. Wouldn't this be too restrictive? -- Lakshay
Hi Lakshay,
BTW: A polynomial class working with std::array / boost::array would be great! No allocation at all...
[…] Also, you would need to know the sizes of vectors at the compile time if arrays are to be used. Wouldn't this be too restrictive?
in my experience (scientist), low order polynomials are much more frequently used than polynomials of order >500. So it might be worthwhile to improve the low order performance, too. Concerning the std::array/boost::array, ideally you want to provide a library that can handle both dynamic and static arrays. Let's say I am user of your library, and I happen to know the order of my polynomial at compile time. Then I would like to rest assured that the library can handle and optimise this case. The Eigen library and my proposed library https://github.com/HDembinski/histogram https://github.com/HDembinski/histogram and follow this approach, they transparently work with both static and dynamic arrays/histograms. Best regards, Hans
participants (5)
-
Alexander Zaitsev
-
Hans Dembinski
-
John Maddock
-
Lakshay Garg
-
Mike Gresens