Image segmentation is an important step in image processing, and it seems everywhere if we want to analyze what's inside the image. Many kinds of research have been done in the area of image segmentation using clustering. There are different methods and one of the most popular methods is K-Means clustering algorithm. Image segmentation is an important step in image processing, and it seems everywhere if we want to analyze what's inside the image. For example, if we seek to find if there is a chair or person inside an indoor image, we may need image segmentation to separate objects and analyze each object individually to check what it is.
The k-means is a simple algorithm that divides the data set into k partitions for n objects where k n. In this method, the data set is partitioned into homogeneous groups with similar characteristics. The similarity or dissimilarity is defined by calculating the distance between the centroid and data points. The clusters are formed when the optimization function for the algorithm achieves its objective -- less intracluster distances and more intercluster distances. Repeat steps 3 and 4 until the centroids in clusters change.
In many applications we want to find the number of clusters in a dataset. A common approach is to use the penalized k-means algorithm with an additive penalty term linear in the number of clusters. An open problem is estimating the value of the coefficient of the penalty term. Since estimating the value of the coefficient in a principled manner appears to be intractable for general clusters, we investigate "ideal clusters", i.e. identical spherical clusters with no overlaps and no outlier background noise. In this paper: (a) We derive, for the case of ideal clusters, rigorous bounds for the coefficient of the additive penalty. Unsurprisingly, the bounds depend on the correct number of clusters, which we want to find in the first place. We further show that additive penalty, even for this simplest case of ideal clusters, typically produces a weak and often ambiguous signature for the correct number of clusters. (b) As an alternative, we examine the k-means with multiplicative penalty, and show that this parameter-free formulation has a stronger, and less often ambiguous, signature for the correct number of clusters. We also empirically investigate certain types of deviations from ideal cluster assumption and show that combination of k-means with additive and multiplicative penalties can resolve ambiguous solutions.
This paper presents a novel accelerated exact k-means algorithm called the Ball k-means algorithm, which uses a ball to describe a cluster, focusing on reducing the point-centroid distance computation. The Ball k-means can accurately find the neighbor clusters for each cluster resulting distance computations only between a point and its neighbor clusters' centroids instead of all centroids. Moreover, each cluster can be divided into a stable area and an active area, and the later one can be further divided into annulus areas. The assigned cluster of the points in the stable area is not changed in the current iteration while the points in the annulus area will be adjusted within a few neighbor clusters in the current iteration. Also, there are no upper or lower bounds in the proposed Ball k-means. Furthermore, reducing centroid-centroid distance computation between iterations makes it efficient for large k clustering. The fast speed, no extra parameters and simple design of the Ball k-means make it an all-around replacement of the naive k-means algorithm.
This paper proposes a centroid-based clustering algorithm which is capable of clustering data-points with n-features in real-time, without having to specify the number of clusters to be formed. We present the core logic behind the algorithm, a similarity measure, which collectively decides whether to assign an incoming data-point to a preexisting cluster, or create a new cluster & assign the data-point to it. The implementation of the proposed algorithm clearly demonstrates how efficiently a data-point with high dimensionality of features is assigned to an appropriate cluster with minimal operations. The proposed algorithm is very application specific and is applicable when the need is perform clustering analysis of real-time data-points, where the similarity measure between an incoming datapoint and the cluster to which the data-point is to be associated with, is greater than predefined Level-of-Similarity. Keywords: Unsupervised learning, Clustering, Centroid, Real-time, Level-of-Similarity