Investigations have been performed into using clustering methods in data mining time-series data from smart meters. The problem is to identify patterns and trends in energy usage profiles of commercial and industrial customers over 24-hour periods, and group similar profiles. We tested our method on energy usage data provided by several U.S. power utilities. The results show accurate grouping of accounts similar in their energy usage patterns, and potential for the method to be utilized in energy efficiency programs.
Clustering is a separation of data into groups of similar objects. Every group called cluster consists of objects that are similar to one another and dissimilar to objects of other groups. In this paper, the K-Means algorithm is implemented by three distance functions and to identify the optimal distance function for clustering methods. The proposed K-Means algorithm is compared with K-Means, Static Weighted K-Means (SWK-Means) and Dynamic Weighted K-Means (DWK-Means) algorithm by using Davis Bouldin index, Execution Time and Iteration count methods. Experimental results show that the proposed K-Means algorithm performed better on Iris and Wine dataset when compared with other three clustering methods.
Alternatively, a similarity function might also be used. Machine learning techniques are usually classified into supervised and unsupervised techniques. Supervised machine learning starts from prior knowledge of the desired result 1) Scale Invariance: The first of Kleinberg's axioms states in the form of labeled data sets, which allows to guide the that f(d) f(α · d) for any distance function d and any training process, whereas unsupervised machine learning scaling factor α 0.  works directly on unlabeled data. In the absence of labels to orient the learning process, these labels must be "discovered" This simple axiom indicates that a clustering algorithm by the learning algorithm.
It's a common task for a data scientist: you need to generate segments (or clusters- I'll use the terms interchangably) of the customer base. With definitions, of course!!! Clustering is the subfield of unsupervised learning that aims to partition unlabelled datasets into consistent groups based on some shared unknown characteristics. All the tools you'll need are in Scikit-Learn, so I'll leave the code to a minimum. Instead, through the medium of GIFs, this tutorial will describe the most common techniques. If GIFs aren't your thing (what are you doing on the internet?), You can download this jupyter notebook here and the gifs can be downloaded from this folder (or you can just right click on the GIFs and select'Save image as…'). Clustering algorithms can be broadly split into two types, depending on whether the number of segments is explicitly specified by the user.
In this article, we study the notion of similarity within the context of cluster analysis. We begin by studying different distances commonly used for this task and highlight certain important properties that they might have, such as the use of data distribution or reduced sensitivity to the curse of dimensionality. Then we study inter- and intra-cluster similarities. We identify how the choices made can influence the nature of the clusters.