Efficient Algorithms for K-Means Clustering.
- License :GPL
- OS:Windows All
- Publisher:David Mount
The KMeans package provides a collection of C++ procedures for performing k-means clustering based on a combination of local search and Lloyd’s algorithm (also known as the k-means algorithm).
Given any set of k centers Z, for each center z in Z, let V(z) denote its neighborhood, that is, the set of data points for which z is the nearest neighbor. Each stage of Lloyd’s algorithm moves every center point z to the centroid of V(z) and then updates V(z) by recomputing the distance from each point to its nearest center. These steps are repeated until convergence.
However, Lloyd’s algorithm can get stuck in locally minimal solutions that are far from the optimal. For this reason it is common to consider heuristics based on local search, in which centers are swapped in and out of an existing solution (typically at random). Such a swap is accepted if it decreases the average distortion, and otherwise it is ignored. It is also possible to combine these two approaches (Lloyd’s algorithm and local search), producing a type of hybrid solution.
This program provides a number of different algorithms for doing k-means clustering based on these ideas and combinations.
Repeatedly applies Lloyd’s algorithm with randomly sampled starting points.
A local search heuristic, which works by performing swaps between existing centers and a set of candidate centers.
A simple hybrid algorithm, which does one swap followed by some number of iterations of Lloyd’s.
A more complex hybrid of Lloyd’s and Swap, which performs some number of swaps followed by some number of iterations of Lloyd’s algorithm. To avoid getting trapped in local minima, an approach similar to simulated annealing is included as well.