Clustering is a separation of data into groups of similar objects. Every group called cluster consists of objects that are similar to one another and dissimilar to objects of other groups. In this paper, the K-Means algorithm is implemented by three distance functions and to identify the optimal distance function for clustering methods. The proposed K-Means algorithm is compared with K-Means, Static Weighted K-Means (SWK-Means) and Dynamic Weighted K-Means (DWK-Means) algorithm by using Davis Bouldin index, Execution Time and Iteration count methods. Experimental results show that the proposed K-Means algorithm performed better on Iris and Wine dataset when compared with other three clustering methods.
Investigations have been performed into using clustering methods in data mining time-series data from smart meters. The problem is to identify patterns and trends in energy usage profiles of commercial and industrial customers over 24-hour periods, and group similar profiles. We tested our method on energy usage data provided by several U.S. power utilities. The results show accurate grouping of accounts similar in their energy usage patterns, and potential for the method to be utilized in energy efficiency programs.
Alternatively, a similarity function might also be used. Machine learning techniques are usually classified into supervised and unsupervised techniques. Supervised machine learning starts from prior knowledge of the desired result 1) Scale Invariance: The first of Kleinberg's axioms states in the form of labeled data sets, which allows to guide the that f(d) f(α · d) for any distance function d and any training process, whereas unsupervised machine learning scaling factor α 0.  works directly on unlabeled data. In the absence of labels to orient the learning process, these labels must be "discovered" This simple axiom indicates that a clustering algorithm by the learning algorithm.
In this article, we study the notion of similarity within the context of cluster analysis. We begin by studying different distances commonly used for this task and highlight certain important properties that they might have, such as the use of data distribution or reduced sensitivity to the curse of dimensionality. Then we study inter- and intra-cluster similarities. We identify how the choices made can influence the nature of the clusters.
Conclusions In this paper we presented HRC, the new clustering algorithm, which can be applicable to very large datasets. Because it exploits representatives of cluster to reduce computational complexity, it is scalable and robust to outliers and noises. HRC is a two phases algorithm that take advantage of a hybrid approach that combine SOM and hierarchical clustering. HRC adopted good features of two methods, SOM's efficiency of processing large datasets and hierarchical clustering's cluster quality.