Dear all, We are finishing the cleanup of a tiny kmeans library. For those who do not know, kmeans is a widely used data clustering algorithm. This special implementation has a lower runtime complexity by taking advantages of the triangle inequalities between clusters and data points at each iteration. This implementation is based on the paper of Charles Elkan https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf We have also python and matlab bindings, fully generic on the data type, and with additional initialization heuristics. I would be happy if we can release this library into Boost. Do you think there is any interest for the community? Best regards, Jean-Claude Passy and Raffi Enficiaud
Hi, I have proposed to provide an implementation of KMeans under uBLAS, as part of my GSoC project this summer. Currently, I am working on designing the API, and have not implemented anything. *My thoughts*: I have proposed to implement a very basic form of kmeans, with three types of initializations - random, kmeans++, Brady-Fayyad. It would be great if we can work together to integrate your implementation as well. It would be helpful if we get inputs from David and Sharique (my mentors) on how to proceed with this. Regards, Dattatreya Mohapatra
Le 12/05/2018 à 06:38, Dattatreya Mohapatra a écrit :
Hi,
I have proposed to provide an implementation of KMeans under uBLAS, as part of my GSoC project this summer. Currently, I am working on designing the API, and have not implemented anything.
*My thoughts*: I have proposed to implement a very basic form of kmeans, with three types of initializations - random, kmeans++, Brady-Fayyad. It would be great if we can work together to integrate your implementation as well.
It would be helpful if we get inputs from David and Sharique (my mentors) on how to proceed with this.
Regards, Dattatreya Mohapatra
Hi, I haven't seen the GSoC proposal, thanks for bringing it to my attention. We have already a design and code that works, and the implementation is using an efficient algorithm. We have also the kmeans++ initialization but not the Brady-Fayyad (I am interested in any pointers). Also I think this would be better used outside of uBlas, because kmeans has a very general use case. The implementation that we have has no dependency other than STL. We are ready to release/integrate (test, doc, benchmarks are here), and this is why I am asking if there is an interest and how to proceed. So let's work together on this if you want. Raffi
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Raffi Enficiaud via Boost Sent: 11 May 2018 20:24 To: boost@lists.boost.org Cc: Raffi Enficiaud Subject: [boost] Interest in a tiny kmeans library
Dear all,
We are finishing the cleanup of a tiny kmeans library. For those who do not know, kmeans is a widely used data clustering algorithm.
This special implementation has a lower runtime complexity by taking advantages of the triangle inequalities between clusters and data points at each iteration. This implementation is based on the paper of Charles Elkan https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf
We have also python and matlab bindings, fully generic on the data type, and with additional initialization heuristics.
I would be happy if we can release this library into Boost. Do you think there is any interest for the community?
This is niche stuff, but I suspect useful nonetheless. Do not be discouraged by immediate lack of interest. But you may need to find some users to press your case. (And don't forget the need for good Boost-style docs). Paul Paul A. Bristow Prizet Farmhouse Kendal UK LA8 8AB +44 1539 561830 +44 7714 33 02 04 +44 7541 40 37 60 paul@pbristow.uk paul.a.bristow@gmail.com paul.a.bristow@outlook.com pbristow@kencomp.net pbristow@hetp.u-net.com
On 22 May 2018 at 15:02, Paul A. Bristow via Boost
-----Original Message----- From: Boost [mailto:boost-bounces@lists.boost.org] On Behalf Of Raffi Enficiaud via Boost Sent: 11 May 2018 20:24 To: boost@lists.boost.org Cc: Raffi Enficiaud Subject: [boost] Interest in a tiny kmeans library
Dear all,
We are finishing the cleanup of a tiny kmeans library. For those who do not know, kmeans is a widely used data clustering algorithm.
This special implementation has a lower runtime complexity by taking advantages of the triangle inequalities between clusters and data points at each iteration. This implementation is based on the paper of Charles Elkan https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf
We have also python and matlab bindings, fully generic on the data type, and with additional initialization heuristics.
I would be happy if we can release this library into Boost. Do you think there is any interest for the community?
This is niche stuff, but I suspect useful nonetheless.
Do not be discouraged by immediate lack of interest.
But you may need to find some users to press your case.
(And don't forget the need for good Boost-style docs).
I second that. I could potentially use it myself, so I'd be interested in seeing it proposed. (with good docs, of course :)) Best regards, -- Mateusz Loskot, http://mateusz.loskot.net
Hi, Fri, May 11, 2018 at 09:23:35PM +0200, Raffi Enficiaud via Boost wrote:
Dear all,
We are finishing the cleanup of a tiny kmeans library. For those who do not know, kmeans is a widely used data clustering algorithm.
This special implementation has a lower runtime complexity by taking advantages of the triangle inequalities between clusters and data points at each iteration. This implementation is based on the paper of Charles Elkan https://www.aaai.org/Papers/ICML/2003/ICML03-022.pdf
We have also python and matlab bindings, fully generic on the data type, and with additional initialization heuristics.
Bindings to matlab are an often requested feature, at least from my experience.
I would be happy if we can release this library into Boost. Do you think there is any interest for the community?
I am interested in such a library and consider it very useful. Is there a possibility to have a look at it somewhere and to experiment with it a bit? Thanks in advance Philipp
participants (5)
-
Dattatreya Mohapatra
-
Mateusz Loskot
-
Paul A. Bristow
-
Philipp Schwaha
-
Raffi Enficiaud