Goto

Collaborating Authors

 Statistical Learning


Robust LogitBoost and Adaptive Base Class (ABC) LogitBoost

arXiv.org Machine Learning

Logitboost is an influential boosting algorithm for classification. In this paper, we develop robust logitboost to provide an explicit formulation of tree-split criterion for building weak learners (regression trees) for logitboost. This formulation leads to a numerically stable implementation of logitboost. We then propose abc-logitboost for multi-class classification, by combining robust logitboost with the prior work of abc-boost. Previously, abc-boost was implemented as abc-mart using the mart algorithm. Our extensive experiments on multi-class classification compare four algorithms: mart, abcmart, (robust) logitboost, and abc-logitboost, and demonstrate the superiority of abc-logitboost. Comparisons with other learning methods including SVM and deep learning are also available through prior publications.


Bayesian exponential family projections for coupled data sources

arXiv.org Machine Learning

Exponential family extensions of principal component analysis (EPCA) have received a considerable amount of attention in recent years, demonstrating the growing need for basic modeling tools that do not assume the squared loss or Gaussian distribution. We extend the EPCA model toolbox by presenting the first exponential family multi-view learning methods of the partial least squares and canonical correlation analysis, based on a unified representation of EPCA as matrix factorization of the natural parameters of exponential family. The models are based on a new family of priors that are generally usable for all such factorizations. We also introduce new inference strategies, and demonstrate how the methods outperform earlier ones when the Gaussianity assumption does not hold.


Combining Spatial and Telemetric Features for Learning Animal Movement Models

arXiv.org Machine Learning

We introduce a new graphical model for tracking radio-tagged animals and learning their movement patterns. The model provides a principled way to combine radio telemetry data with an arbitrary set of userdefined, spatial features. We describe an efficient stochastic gradient algorithm for fitting model parameters to data and demonstrate its effectiveness via asymptotic analysis and synthetic experiments. We also apply our model to real datasets, and show that it outperforms the most popular radio telemetry software package used in ecology. We conclude that integration of different data sources under a single statistical framework, coupled with appropriate parameter and state estimation procedures, produces both accurate location estimates and an interpretable statistical model of animal movement.


Bayesian Rose Trees

arXiv.org Machine Learning

Hierarchical structure is ubiquitous in data across many domains. There are many hierarchical clustering methods, frequently used by domain experts, which strive to discover this structure. However, most of these methods limit discoverable hierarchies to those with binary branching structure. This limitation, while computationally convenient, is often undesirable. In this paper we explore a Bayesian hierarchical clustering algorithm that can produce trees with arbitrary branching structure at each node, known as rose trees. We interpret these trees as mixtures over partitions of a data set, and use a computationally efficient, greedy agglomerative algorithm to find the rose trees which have high marginal likelihood given the data. Lastly, we perform experiments which demonstrate that rose trees are better models of data than the typical binary trees returned by other hierarchical clustering algorithms.


Gaussian Process Topic Models

arXiv.org Machine Learning

We introduce Gaussian Process Topic Models (GPTMs), a new family of topic models which can leverage a kernel among documents while extracting correlated topics. GPTMs can be considered a systematic generalization of the Correlated Topic Models (CTMs) using ideas from Gaussian Process (GP) based embedding. Since GPTMs work with both a topic covariance matrix and a document kernel matrix, learning GPTMs involves a novel component-solving a suitable Sylvester equation capturing both topic and document dependencies. The efficacy of GPTMs is demonstrated with experiments evaluating the quality of both topic modeling and embedding.


Robust Metric Learning by Smooth Optimization

arXiv.org Machine Learning

Most existing distance metric learning methods assume perfect side information that is usually given in pairwise or triplet constraints. Instead, in many real-world applications, the constraints are derived from side information, such as users' implicit feedbacks and citations among articles. As a result, these constraints are usually noisy and contain many mistakes. In this work, we aim to learn a distance metric from noisy constraints by robust optimization in a worst-case scenario, to which we refer as robust metric learning. We formulate the learning task initially as a combinatorial optimization problem, and show that it can be elegantly transformed to a convex programming problem. We present an efficient learning algorithm based on smooth optimization [7]. It has a worst-case convergence rate of O(1/{\surd}{\varepsilon}) for smooth optimization problems, where {\varepsilon} is the desired error of the approximate solution. Finally, our empirical study with UCI data sets demonstrate the effectiveness of the proposed method in comparison to state-of-the-art methods.


Generalized Beta Mixtures of Gaussians

arXiv.org Machine Learning

In recent years, a rich variety of shrinkage priors have been proposed that have great promise in addressing massive regression problems. In general, these new priors can be expressed as scale mixtures of normals, but have more complex forms and better properties than traditional Cauchy and double exponential priors. We first propose a new class of normal scale mixtures through a novel generalized beta distribution that encompasses many interesting priors as special cases. This encompassing framework should prove useful in comparing competing priors, considering properties and revealing close connections. We then develop a class of variational Bayes approximations through the new hierarchy presented that will scale more efficiently to the types of truly massive data sets that are now encountered routinely.


A Generalized Least Squares Matrix Decomposition

arXiv.org Machine Learning

Variables in many massive high-dimensional data sets are structured, arising for example from measurements on a regular grid as in imaging and time series or from spatial-temporal measurements as in climate studies. Classical multivariate techniques ignore these structural relationships often resulting in poor performance. We propose a generalization of the singular value decomposition (SVD) and principal components analysis (PCA) that is appropriate for massive data sets with structured variables or known two-way dependencies. By finding the best low rank approximation of the data with respect to a transposable quadratic norm, our decomposition, entitled the Generalized least squares Matrix Decomposition (GMD), directly accounts for structural relationships. As many variables in high-dimensional settings are often irrelevant or noisy, we also regularize our matrix decomposition by adding two-way penalties to encourage sparsity or smoothness. We develop fast computational algorithms using our methods to perform generalized PCA (GPCA), sparse GPCA, and functional GPCA on massive data sets. Through simulations and a whole brain functional MRI example we demonstrate the utility of our methodology for dimension reduction, signal recovery, and feature selection with high-dimensional structured data.


Differential Privacy for Functions and Functional Data

arXiv.org Machine Learning

Differential privacy is a framework for privately releasing summaries of a database. Previous work has focused mainly on methods for which the output is a finite dimensional vector, or an element of some discrete set. We develop methods for releasing functions while preserving differential privacy. Specifically, we show that adding an appropriate Gaussian process to the function of interest yields differential privacy. When the functions lie in the same RKHS as the Gaussian process, then the correct noise level is established by measuring the "sensitivity" of the function in the RKHS norm. As examples we consider kernel density estimation, kernel support vector machines, and functions in reproducing kernel Hilbert spaces.


Decentralized, Adaptive, Look-Ahead Particle Filtering

arXiv.org Machine Learning

The decentralized particle filter (DPF) was proposed recently to increase the level of parallelism of particle filtering. Given a decomposition of the state space into two nested sets of variables, the DPF uses a particle filter to sample the first set and then conditions on this sample to generate a set of samples for the second set of variables. The DPF can be understood as a variant of the popular Rao-Blackwellized particle filter (RBPF), where the second step is carried out using Monte Carlo approximations instead of analytical inference. As a result, the range of applications of the DPF is broader than the one for the RBPF. In this paper, we improve the DPF in two ways. First, we derive a Monte Carlo approximation of the optimal proposal distribution and, consequently, design and implement a more efficient look-ahead DPF. Although the decentralized filters were initially designed to capitalize on parallel implementation, we show that the look-ahead DPF can outperform the standard particle filter even on a single machine. Second, we propose the use of bandit algorithms to automatically configure the state space decomposition of the DPF.