Collaborating Authors


Covariance Estimation for Matrix-valued Data Machine Learning

Covariance estimation for matrix-valued data has received an increasing interest in applications including neuroscience and environmental studies. Unlike previous works that rely heavily on matrix normal distribution assumption and the requirement of fixed matrix size, we propose a class of distribution-free regularized covariance estimation methods for high-dimensional matrix data under a separability condition and a bandable covariance structure. Under these conditions, the original covariance matrix is decomposed into a Kronecker product of two bandable small covariance matrices representing the variability over row and column directions. We formulate a unified framework for estimating the banded and tapering covariance, and introduce an efficient algorithm based on rank one unconstrained Kronecker product approximation. The convergence rates of the proposed estimators are studied and compared to the ones for the usual vector-valued data. We further introduce a class of robust covariance estimators and provide theoretical guarantees to deal with the potential heavy-tailed data. We demonstrate the superior finite-sample performance of our methods using simulations and real applications from an electroencephalography study and a gridded temperature anomalies dataset.

Deep Normalization for Speaker Vectors Machine Learning

Deep speaker embedding has demonstrated state-of-the-art performance in audio speaker recognition (SRE). However, one potential issue with this approach is that the speaker vectors derived from deep embedding models tend to be non-Gaussian for each individual speaker, and non-homogeneous for distributions of different speakers. These irregular distributions can seriously impact SRE performance, especially with the popular PLDA scoring method, which assumes homogeneous Gaussian distribution. In this paper, we argue that deep speaker vectors require deep normalization, and propose a deep normalization approach based on a novel discriminative normalization flow (DNF) model. We demonstrate the effectiveness of the proposed approach with experiments using the widely used SITW and CNCeleb corpora. In these experiments, the DNF-based normalization delivered substantial performance gains and also showed strong generalization capability in out-of-domain tests.

On Two Distinct Sources of Nonidentifiability in Latent Position Random Graph Models Machine Learning

The statistical analysis of network data is important for fields such as neuroscience (Vogelstein et al., 2012), sociology (Hoff et al., 2002), and physics (Newman and Girvan, 2004; Bickel and Chen, 2009). Recently, network data have become ubiquitous in the the modern data-science landscape, and a large literature on statistical methods for analyzing these data has developed. Popular statistical models for conditionally independent random graphs include, but are not limited to, the stochastic block model (Holland et al., 1983), the random dot product graph (Young and Scheinerman, 2007; Athreya et al., 2017), and graphons (Lovász, 2012; Diaconis and Janson, 2007). Both the stochastic block model and the random dot product graph are examples of latent position random graphs (Hoff et al., 2002), a graph model that is motivated by the idea that individual nodes have latent positions whose values determine their propensity to form connections. The purpose of this manuscript is to explain a curious phenomenon that arises in latent position random graph settings.

Scalable Variational Gaussian Process Regression Networks Machine Learning

Gaussian process regression networks (GPRN) are powerful Bayesian models for multi-output regression, but their inference is intractable. To address this issue, existing methods use a fully factorized structure (or a mixture of such structures) over all the outputs and latent functions for posterior approximation, which, however, can miss the strong posterior dependencies among the latent variables and hurt the inference quality. In addition, the updates of the variational parameters are inefficient and can be prohibitively expensive for a large number of outputs. To overcome these limitations, we propose a scalable variational inference algorithm for GPRN, which not only captures the abundant posterior dependencies but also is much more efficient for massive outputs. We tensorize the output space and introduce tensor/matrix-normal variational posteriors to capture the posterior correlations and to reduce the parameters. We jointly optimize all the parameters and exploit the inherent Kronecker product structure in the variational model evidence lower bound to accelerate the computation. We demonstrate the advantages of our method in several real-world applications.

Variational Inference with Parameter Learning Applied to Vehicle Trajectory Estimation Machine Learning

We present parameter learning in a Gaussian variational inference setting using only noisy measurements (i.e., no groundtruth). This is demonstrated in the context of vehicle trajectory estimation, although the method we propose is general. The paper extends the Exactly Sparse Gaussian Variational Inference (ESGVI) framework, which has previously been used for large-scale nonlinear batch state estimation. Our contribution is to additionally learn parameters of our system models (which may be difficult to choose in practice) within the ESGVI framework. In this paper, we learn the covariances for the motion and sensor models used within vehicle trajectory estimation. Specifically, we learn the parameters of a white-noise-on-acceleration motion model and the parameters of an Inverse-Wishart prior over measurement covariances for our sensor model. We demonstrate our technique using a 36 km dataset consisting of a car using lidar to localize against a high-definition map; we learn the parameters on a training section of the data and then show that we achieve high-quality state estimates on a test section, even in the presence of outliers.

Ellipsoidal Subspace Support Vector Data Description Artificial Intelligence

In this paper, we propose a novel method for transforming data into a low-dimensional space optimized for one-class classification. The proposed method iteratively transforms data into a new subspace optimized for ellipsoidal encapsulation of target class data. We provide both linear and non-linear formulations for the proposed method. The method takes into account the covariance of the data in the subspace; hence, it yields a more generalized solution as compared to Subspace Support Vector Data Description for a hypersphere. We propose different regularization terms expressing the class variance in the projected space. We compare the results with classic and recently proposed one-class classification methods and achieve better results in the majority of cases. The proposed method is also noticed to converge much faster than recently proposed Subspace Support Vector Data Description.

Kalman Filter, Sensor Fusion, and Constrained Regression: Equivalences and Insights

Neural Information Processing Systems

The Kalman filter (KF) is one of the most widely used tools for data assimilation and sequential estimation. In this work, we show that the state estimates from the KF in a standard linear dynamical system setting are equivalent to those given by the KF in a transformed system, with infinite process noise (i.e., a flat prior'') and an augmented measurement space. This reformulation---which we refer to as augmented measurement sensor fusion (SF)---is conceptually interesting, because the transformed system here is seemingly static (as there is effectively no process model), but we can still capture the state dynamics inherent to the KF by folding the process model into the measurement space. Further, this reformulation of the KF turns out to be useful in settings in which past states are observed eventually (at some lag). Here, when the measurement noise covariance is estimated by the empirical covariance, we show that the state predictions from SF are equivalent to those from a regression of past states on past measurements, subject to particular linear constraints (reflecting the relationships encoded in the measurement map).

Estimating Basis Functions in Massive Fields under the Spatial Mixed Effects Model Machine Learning

Spatial prediction is commonly achieved under the assumption of a Gaussian random field (GRF) by obtaining maximum likelihood estimates of parameters, and then using the kriging equations to arrive at predicted values. For massive datasets, fixed rank kriging using the Expectation-Maximization (EM) algorithm for estimation has been proposed as an alternative to the usual but computationally prohibitive kriging method. The method reduces computation cost of estimation by redefining the spatial process as a linear combination of basis functions and spatial random effects. A disadvantage of this method is that it imposes constraints on the relationship between the observed locations and the knots. We develop an alternative method that utilizes the Spatial Mixed Effects (SME) model, but allows for additional flexibility by estimating the range of the spatial dependence between the observations and the knots via an Alternating Expectation Conditional Maximization (AECM) algorithm. Experiments show that our methodology improves estimation without sacrificing prediction accuracy while also minimizing the additional computational burden of extra parameter estimation. The methodology is applied to a temperature data set archived by the United States National Climate Data Center, with improved results over previous methodology.

Bayesian optimization of variable-size design space problems Machine Learning

Within the framework of complex system design, it is often necessary to solve mixed variable optimization problems, in which the objective and constraint functions can depend simultaneously on continuous and discrete variables. Additionally, complex system design problems occasionally present a variable-size design space. This results in an optimization problem for which the search space varies dynamically (with respect to both number and type of variables) along the optimization process as a function of the values of specific discrete decision variables. Similarly, the number and type of constraints can vary as well. In this paper, two alternative Bayesian Optimization-based approaches are proposed in order to solve this type of optimization problems. The first one consists in a budget allocation strategy allowing to focus the computational budget on the most promising design sub-spaces. The second approach, instead, is based on the definition of a kernel function allowing to compute the covariance between samples characterized by partially different sets of variables. The results obtained on analytical and engineering related test-cases show a faster and more consistent convergence of both proposed methods with respect to the standard approaches.

A Framework for Interdomain and Multioutput Gaussian Processes Machine Learning

One obstacle to the use of Gaussian processes (GPs) in large-scale problems, and as a component in deep learning system, is the need for bespoke derivations and implementations for small variations in the model or inference. In order to improve the utility of GPs we need a modular system that allows rapid implementation and testing, as seen in the neural network community. We present a mathematical and software framework for scalable approximate inference in GPs, which combines interdomain approximations and multiple outputs. Our framework, implemented in GPflow, provides a unified interface for many existing multioutput models, as well as more recent convolutional structures. This simplifies the creation of deep models with GPs, and we hope that this work will encourage more interest in this approach.