In order to improve the efficiency and sustainability of electricity systems, most countries worldwide are deploying advanced metering infrastructures, and in particular household smart meters, in the residential sector. This technology is able to record electricity load time series at a very high frequency rates, information that can be exploited to develop new clustering models to group individual households by similar consumptions patterns. To this end, in this work we propose three hierarchical clustering methodologies that allow capturing different characteristics of the time series. These are based on a set of "dissimilarity" measures computed over different features: quantile auto-covariances, and simple and partial autocorrelations. The main advantage is that they allow summarizing each time series in a few representative features so that they are computationally efficient, robust against outliers, easy to automatize, and scalable to hundreds of thousands of smart meters series. We evaluate the performance of each clustering model in a real-world smart meter dataset with thousands of half-hourly time series. The results show how the obtained clusters identify relevant consumption behaviors of households and capture part of their geo-demographic segmentation. Moreover, we apply a supervised classification procedure to explore which features are more relevant to define each cluster.
Predictive Analytics models rely heavily on Regression, Classification and Clustering methods. When analysing the effectiveness of a predictive model, the closer the predictions are to the actual data, the better it is. This article hopes to be a one-stop reference to the major problems and their most popular/effective solutions, without diving into details for execution. Primarily, data selection and pruning happens during the Data Preparation phase, where you take care to get rid of bad data in the first place. Then again, there are issues with the data, and their relevance to the ML model's objectives during training, troubles with usage of algorithms, and errors in the data that occur throughout.
In this paper we consider portmanteau tests for testing the adequacy of multiplicative seasonal autoregressive moving-average (SARMA) models under the assumption that the errors are uncorrelated but not necessarily independent.We relax the standard independence assumption on the error term in order to extend the range of application of the SARMA models.We study the asymptotic distributions of residual and normalized residual empirical autocovariances and autocorrelations underweak assumptions on the noise. We establish the asymptotic behaviour of the proposed statistics. A set of Monte Carlo experiments and an application to monthly mean total sunspot number are presented.
The Low Autocorrelation Binary Sequence problem has applications in telecommunications, is of theoretical interest to physicists, and has inspired many optimisation researchers. Metaheuristics for the problem have progressed greatly in recent years but complete search has not progressed since a branch-and-bound method of 1996. In this paper we find four ways of improving branch-and-bound, leading to a tighter relaxation, faster convergence to optimality, and better empirical scalability.
I used this example in my talk at useR!2019 in Toulouse, and it is also the basis of a vignette in the package, and a recent blog post by Mitchell O'Hara-Wild. The data set contains domestic tourist visitor nights in Australia, disaggregated by State, Region and Purpose. An example of a feature would be the autocorrelation function at lag 1 -- it is a numerical summary capturing some aspect of the time series. Autocorrelations at other lags are also features, as are the autocorrelations of the first differenced series, or the seasonally differenced series, etc. Values close to 1 indicate a highly seasonal time series, while values close to 0 indicate a time series with little seasonality.