Statistical Learning
Efficient State-Space Inference of Periodic Latent Force Models
Reece, Steven, Roberts, Stephen, Ghosh, Siddhartha, Rogers, Alex, Jennings, Nicholas
Latent force models (LFM) are principled approaches to incorporating solutions to differential equations within non-parametric inference methods. Unfortunately, the development and application of LFMs can be inhibited by their computational cost, especially when closed-form solutions for the LFM are unavailable, as is the case in many real world problems where these latent forces exhibit periodic behaviour. Given this, we develop a new sparse representation of LFMs which considerably improves their computational efficiency, as well as broadening their applicability, in a principled way, to domains with periodic or near periodic latent forces. Our approach uses a linear basis model to approximate one generative model for each periodic force. We assume that the latent forces are generated from Gaussian process priors and develop a linear basis model which fully expresses these priors. We apply our approach to model the thermal dynamics of domestic buildings and show that it is effective at predicting day-ahead temperatures within the homes. We also apply our approach within queueing theory in which quasi-periodic arrival rates are modelled as latent forces. In both cases, we demonstrate that our approach can be implemented efficiently using state-space methods which encode the linear dynamic systems via LFMs. Further, we show that state estimates obtained using periodic latent force models can reduce the root mean squared error to 17% of that from non-periodic models and 27% of the nearest rival approach which is the resonator model.
Functional Gaussian processes for regression with linear PDE models
Nguyen, Ngoc-Cuong, Peraire, Jaime
In this paper, we present a new statistical approach to the problem of incorporating experimental observations into a mathematical model described by linear partial differential equations (PDEs) to improve the prediction of the state of a physical system. We augment the linear PDE with a functional that accounts for the uncertainty in the mathematical model and is modeled as a {\em Gaussian process}. This gives rise to a stochastic PDE which is characterized by the Gaussian functional. We develop a {\em functional Gaussian process regression} method to determine the posterior mean and covariance of the Gaussian functional, thereby solving the stochastic PDE to obtain the posterior distribution for our prediction of the physical state. Our method has the following features which distinguish itself from other regression methods. First, it incorporates both the mathematical model and the observations into the regression procedure. Second, it can handle the observations given in the form of linear functionals of the field variable. Third, the method is non-parametric in the sense that it provides a systematic way to optimally determine the prior covariance operator of the Gaussian functional based on the observations. Fourth, it provides the posterior distribution quantifying the magnitude of uncertainty in our prediction of the physical state. We present numerical results to illustrate these features of the method and compare its performance to that of the standard Gaussian process regression.
Linearized Alternating Direction Method with Parallel Splitting and Adaptive Penalty for Separable Convex Programs in Machine Learning
Lin, Zhouchen, Liu, Risheng, Li, Huan
Many problems in machine learning and other fields can be (re)for-mulated as linearly constrained separable convex programs. In most of the cases, there are multiple blocks of variables. However, the traditional alternating direction method (ADM) and its linearized version (LADM, obtained by linearizing the quadratic penalty term) are for the two-block case and cannot be naively generalized to solve the multi-block case. So there is great demand on extending the ADM based methods for the multi-block case. In this paper, we propose LADM with parallel splitting and adaptive penalty (LADMPSAP) to solve multi-block separable convex programs efficiently. When all the component objective functions have bounded subgradients, we obtain convergence results that are stronger than those of ADM and LADM, e.g., allowing the penalty parameter to be unbounded and proving the sufficient and necessary conditions} for global convergence. We further propose a simple optimality measure and reveal the convergence rate of LADMPSAP in an ergodic sense. For programs with extra convex set constraints, with refined parameter estimation we devise a practical version of LADMPSAP for faster convergence. Finally, we generalize LADMPSAP to handle programs with more difficult objective functions by linearizing part of the objective function as well. LADMPSAP is particularly suitable for sparse representation and low-rank recovery problems because its subproblems have closed form solutions and the sparsity and low-rankness of the iterates can be preserved during the iteration. It is also highly parallelizable and hence fits for parallel or distributed computing. Numerical experiments testify to the advantages of LADMPSAP in speed and numerical accuracy.
Futility Analysis in the Cross-Validation of Machine Learning Models
Many machine learning models have important structural tuning parameters that cannot be directly estimated from the data. The common tactic for setting these parameters is to use resampling methods, such as cross--validation or the bootstrap, to evaluate a candidate set of values and choose the best based on some pre--defined criterion. Unfortunately, this process can be time consuming. However, the model tuning process can be streamlined by adaptively resampling candidate values so that settings that are clearly sub-optimal can be discarded. The notion of futility analysis is introduced in this context. An example is shown that illustrates how adaptive resampling can be used to reduce training time. Simulation studies are used to understand how the potential speed--up is affected by parallel processing techniques.
Large Scale, Large Margin Classification using Indefinite Similarity Measures
Aghazadeh, Omid, Carlsson, Stefan
Despite the success of the popular kernelized support vector machines, they have two major limitations: they are restricted to Positive Semi-Definite (PSD) kernels, and their training complexity scales at least quadratically with the size of the data. Many natural measures of similarity between pairs of samples are not PSD e.g. invariant kernels, and those that are implicitly or explicitly defined by latent variable models. In this paper, we investigate scalable approaches for using indefinite similarity measures in large margin frameworks. In particular we show that a normalization of similarity to a subset of the data points constitutes a representation suitable for linear classifiers. The result is a classifier which is competitive to kernelized SVM in terms of accuracy, despite having better training and test time complexities. Experimental results demonstrate that on CIFAR-10 dataset, the model equipped with similarity measures invariant to rigid and non-rigid deformations, can be made more than 5 times sparser while being more accurate than kernelized SVM using RBF kernels.
Supervised Dictionary Learning by a Variational Bayesian Group Sparse Nonnegative Matrix Factorization
INCE the appearance of the seminal paper [1], NMF has become a popular data decomposition technique due to succesful applications in a still growing number of fields where data are nonnegative, such as pixel intensities in computer vision, amplitude spectra in audio signal analysis and EEG signal analysis, term counts in document clustering problems, and item ratings in collaborative filtering. NMF aims at decompositions, where, and are all nonnegative matrices. Throughout this paper will be regarded as a collection of data samples organized columnwise, as a dictionary of features organized columnwise, and as matrix of coefficients when is projected onto the dictionary. Under assumptions of linearity and nonnegativity, when underlying dimensionality is lower than dimensionality of the original space of the data, dimensionality reduction of the data can effectively be achieved this way. Although the decomposition is nonunique in general, NMF is able to produce strictly additive decompositions perceived as part-based by adding additional bias in the model [1], [2]. To this end, different sparsity promoting regularizers have been proposed for divergence-based NMF [3]. Also, to include higher order data descriptions, many other variants have been developed, e.g.
Layered Logic Classifiers: Exploring the `And' and `Or' Relations
Tu, Zhuowen, Dollar, Piotr, Wu, Yingnian
Designing effective and efficient classifier for pattern analysis is a key problem in machine learning and computer vision. Many the solutions to the problem require to perform logic operations such as `and', `or', and `not'. Classification and regression tree (CART) include these operations explicitly. Other methods such as neural networks, SVM, and boosting learn/compute a weighted sum on features (weak classifiers), which weakly perform the 'and' and 'or' operations. However, it is hard for these classifiers to deal with the 'xor' pattern directly. In this paper, we propose layered logic classifiers for patterns of complicated distributions by combining the `and', `or', and `not' operations. The proposed algorithm is very general and easy to implement. We test the classifiers on several typical datasets from the Irvine repository and two challenging vision applications, object segmentation and pedestrian detection. We observe significant improvements on all the datasets over the widely used decision stump based AdaBoost algorithm. The resulting classifiers have much less training complexity than decision tree based AdaBoost, and can be applied in a wide range of domains.
Fast and Robust Archetypal Analysis for Representation Learning
Chen, Yuansi, Mairal, Julien, Harchaoui, Zaid
We revisit a pioneer unsupervised learning technique called archetypal analysis, which is related to successful data analysis methods such as sparse coding and non-negative matrix factorization. Since it was proposed, archetypal analysis did not gain a lot of popularity even though it produces more interpretable models than other alternatives. Because no efficient implementation has ever been made publicly available, its application to important scientific problems may have been severely limited. Our goal is to bring back into favour archetypal analysis. We propose a fast optimization scheme using an active-set strategy, and provide an efficient open-source implementation interfaced with Matlab, R, and Python. Then, we demonstrate the usefulness of archetypal analysis for computer vision tasks, such as codebook learning, signal classification, and large image collection visualization.
Bayesian Inference for Gaussian Process Classifiers with Annealing and Pseudo-Marginal MCMC
Kernel methods have revolutionized the fields of pattern recognition and machine learning. Their success, however, critically depends on the choice of kernel parameters. Using Gaussian process (GP) classification as a working example, this paper focuses on Bayesian inference of covariance (kernel) parameters using Markov chain Monte Carlo (MCMC) methods. The motivation is that, compared to standard optimization of kernel parameters, they have been systematically demonstrated to be superior in quantifying uncertainty in predictions. Recently, the Pseudo-Marginal MCMC approach has been proposed as a practical inference tool for GP models. In particular, it amounts in replacing the analytically intractable marginal likelihood by an unbiased estimate obtainable by approximate methods and importance sampling. After discussing the potential drawbacks in employing importance sampling, this paper proposes the application of annealed importance sampling. The results empirically demonstrate that compared to importance sampling, annealed importance sampling can reduce the variance of the estimate of the marginal likelihood exponentially in the number of data at a computational cost that scales only polynomially. The results on real data demonstrate that employing annealed importance sampling in the Pseudo-Marginal MCMC approach represents a step forward in the development of fully automated exact inference engines for GP models.
Understanding Machine-learned Density Functionals
Li, Li, Snyder, John C., Pelaschier, Isabelle M., Huang, Jessica, Niranjan, Uma-Naresh, Duncan, Paul, Rupp, Matthias, Müller, Klaus-Robert, Burke, Kieron
Kernel ridge regression is used to approximate the kinetic energy of non-interacting fermions in a one-dimensional box as a functional of their density. The properties of different kernels and methods of cross-validation are explored, and highly accurate energies are achieved. Accurate {\em constrained optimal densities} are found via a modified Euler-Lagrange constrained minimization of the total energy. A projected gradient descent algorithm is derived using local principal component analysis. Additionally, a sparse grid representation of the density can be used without degrading the performance of the methods. The implications for machine-learned density functional approximations are discussed.