Collaborating Authors

Kronecker Sum Decompositions of Space-Time Data Machine Learning

In this paper we consider the use of the space vs. time Kronecker product decomposition in the estimation of covariance matrices for spatio-temporal data. This decomposition imposes lower dimensional structure on the estimated covariance matrix, thus reducing the number of samples required for estimation. To allow a smooth tradeoff between the reduction in the number of parameters (to reduce estimation variance) and the accuracy of the covariance approximation (affecting estimation bias), we introduce a diagonally loaded modification of the sum of kronecker products representation [1]. We derive a Cramer-Rao bound (CRB) on the minimum attainable mean squared predictor coefficient estimation error for unbiased estimators of Kronecker structured covariance matrices. We illustrate the accuracy of the diagonally loaded Kronecker sum decomposition by applying it to video data of human activity.

It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals

Neural Information Processing Systems

Multi-task prediction models are widely being used to couple regressors or classification models by sharing information across related tasks. A common pitfall of these models is that they assume that the output tasks are independent conditioned on the inputs. Here, we propose a multi-task Gaussian process approach to model both the relatedness between regressors as well as the task correlations in the residuals, in order to more accurately identify true sharing between regressors. The resulting Gaussian model has a covariance term that is the sum of Kronecker products, for which efficient parameter inference and out of sample prediction are feasible. On both synthetic examples and applications to phenotype prediction in genetics, we find substantial benefits of modeling structured noise compared to established alternatives.

Finite sample approximations of exact and entropic Wasserstein distances between covariance operators and Gaussian processes Machine Learning

This work studies finite sample approximations of the exact and entropic regularized Wasserstein distances between centered Gaussian processes and, more generally, covariance operators of functional random processes. We first show that these distances/divergences are fully represented by reproducing kernel Hilbert space (RKHS) covariance and cross-covariance operators associated with the corresponding covariance functions. Using this representation, we show that the Sinkhorn divergence between two centered Gaussian processes can be consistently and efficiently estimated from the divergence between their corresponding normalized finite-dimensional covariance matrices, or alternatively, their sample covariance operators. Consequently, this leads to a consistent and efficient algorithm for estimating the Sinkhorn divergence from finite samples generated by the two processes. For a fixed regularization parameter, the convergence rates are {\it dimension-independent} and of the same order as those for the Hilbert-Schmidt distance. If at least one of the RKHS is finite-dimensional, we obtain a {\it dimension-dependent} sample complexity for the exact Wasserstein distance between the Gaussian processes.

Deep CNNs Meet Global Covariance Pooling: Better Representation and Generalization Artificial Intelligence

Compared with global average pooling in existing deep convolutional neural networks (CNNs), global covariance pooling can capture richer statistics of deep features, having potential for improving representation and generalization abilities of deep CNNs. However, integration of global covariance pooling into deep CNNs brings two challenges: (1) robust covariance estimation given deep features of high dimension and small sample; (2) appropriate use of geometry of covariances. To address these challenges, we propose a global Matrix Power Normalized COVariance (MPN-COV) Pooling. Our MPN-COV conforms to a robust covariance estimator, very suitable for scenario of high dimension and small sample. It can also be regarded as power-Euclidean metric between covariances, effectively exploiting their geometry. Furthermore, a global Gaussian embedding method is proposed to incorporate first-order statistics into MPN-COV. For fast training of MPN-COV networks, we propose an iterative matrix square root normalization, avoiding GPU unfriendly eigen-decomposition inherent in MPN-COV. Additionally, progressive 1x1 and group convolutions are introduced to compact covariance representations. The MPN-COV and its variants are highly modular, readily plugged into existing deep CNNs. Extensive experiments are conducted on large-scale object classification, scene categorization, fine-grained visual recognition and texture classification, showing our methods are superior to the counterparts and achieve state-of-the-art performance.

Evolution of Covariance Functions for Gaussian Process Regression using Genetic Programming Machine Learning

In this contribution we describe an approach to evolve composite covariance functions for Gaussian processes using genetic programming. A critical aspect of Gaussian processes and similar kernel-based models such as SVM is, that the covariance function should be adapted to the modeled data. Frequently, the squared exponential covariance function is used as a default. However, this can lead to a misspecified model, which does not fit the data well. In the proposed approach we use a grammar for the composition of covariance functions and genetic programming to search over the space of sentences that can be derived from the grammar. We tested the proposed approach on synthetic data from two-dimensional test functions, and on the Mauna Loa CO2 time series. The results show, that our approach is feasible, finding covariance functions that perform much better than a default covariance function. For the CO2 data set a composite covariance function is found, that matches the performance of a hand-tuned covariance function.