Goto

Collaborating Authors

 Uncertainty


Semistochastic Quadratic Bound Methods

arXiv.org Machine Learning

Partition functions arise in a variety of settings, including conditional random fields, logistic regression, and latent gaussian models. In this paper, we consider semistochastic quadratic bound (SQB) methods for maximum likelihood estimation based on partition function optimization. Batch methods based on the quadratic bound were recently proposed for this class of problems, and performed favorably in comparison to state-of-the-art techniques. Semistochastic methods fall in between batch algorithms, which use all the data, and stochastic gradient type methods, which use small random selections at each iteration. We build semistochastic quadratic bound-based methods, and prove both global convergence (to a stationary point) under very weak assumptions, and linear convergence rate under stronger assumptions on the objective. To make the proposed methods faster and more stable, we consider inexact subproblem minimization and batch-size selection schemes. The efficacy of SQB methods is demonstrated via comparison with several state-of-the-art techniques on commonly used datasets.


Modeling Human Decision-making in Generalized Gaussian Multi-armed Bandits

arXiv.org Machine Learning

We present a formal model of human decision-making in explore-exploit tasks using the context of multi-armed bandit problems, where the decision-maker must choose among multiple options with uncertain rewards. We address the standard multi-armed bandit problem, the multi-armed bandit problem with transition costs, and the multi-armed bandit problem on graphs. We focus on the case of Gaussian rewards in a setting where the decision-maker uses Bayesian inference to estimate the reward values. We model the decision-maker's prior knowledge with the Bayesian prior on the mean reward. We develop the upper credible limit (UCL) algorithm for the standard multi-armed bandit problem and show that this deterministic algorithm achieves logarithmic cumulative expected regret, which is optimal performance for uninformative priors. We show how good priors and good assumptions on the correlation structure among arms can greatly enhance decision-making performance, even over short time horizons. We extend to the stochastic UCL algorithm and draw several connections to human decision-making behavior. We present empirical data from human experiments and show that human performance is efficiently captured by the stochastic UCL algorithm with appropriate parameters. For the multi-armed bandit problem with transition costs and the multi-armed bandit problem on graphs, we generalize the UCL algorithm to the block UCL algorithm and the graphical block UCL algorithm, respectively. We show that these algorithms also achieve logarithmic cumulative expected regret and require a sub-logarithmic expected number of transitions among arms. We further illustrate the performance of these algorithms with numerical examples.


Generative Modelling for Unsupervised Score Calibration

arXiv.org Machine Learning

ABSTRACT Score calibration enables automatic speaker recognizers to make cost-effective accept / reject decisions. Traditional calibration requires supervised data, which is an expensive resource. We propose a 2-component GMM for unsupervised calibration and demonstrate good performance relative to a supervised baseline on NIST SRE'10 and SRE'12. A Bayesian analysis demonstrates that the uncertainty associated with the unsupervised calibration parameter estimates is surprisingly small. Index Terms-- calibration, unsupervised learning, Laplace approximation, automatic speaker recognition 1. INTRODUCTION Automatic speaker recognizers map trials to scores.


The Law of Total Odds

arXiv.org Machine Learning

The law of total probability may be deployed in binary classification exercises to estimate the unconditional class probabilities if the class proportions in the training set are not representative of the population class proportions. We argue that this is not a conceptually sound approach and suggest an alternative based on the new law of total odds. We quantify the bias of the total probability estimator of the unconditional class probabilities and show that the total odds estimator is unbiased. The sample version of the total odds estimator is shown to coincide with a maximum-likelihood estimator known from the literature. The law of total odds can also be used for transforming the conditional class probabilities if independent estimates of the unconditional class probabilities of the population are available. Keywords: Total probability, likelihood ratio, Bayes' formula, binary classification, relative odds, unbiased estimator, supervised learning, dataset shift.


Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget

arXiv.org Machine Learning

Markov chain Monte Carlo (MCMC) sampling has been the main workhorse of Bayesian computation since the 1990s. A canonical MCMC algorithm proposes samples from a distribution q and then accepts or rejects these proposals with a certain probability given by the Metropolis-Hastings (MH) formula [Metropolis et al., 1953, Hastings, 1970]. For each proposed sample, the MH rule needs to examine the likelihood of all dataitems. When the number of data-cases is large this is an awful lot of computation for one bit of information, namely whether to accept or reject a proposal. In today's Big Data world, we need to rethink our Bayesian inference algorithms. Standard MCMC methods do not meet the Big Data challenge for the reason described above. Researchers have made some progress in terms of making MCMC more efficient, mostly by focusing on parallelization. Very few question the algorithm itself: is the standard MCMC paradigm really optimally efficient in achieving its goals?


Gaussian Process Volatility Model

arXiv.org Machine Learning

The accurate prediction of time-changing variances is an important task in the modeling of financial data. Standard econometric models are often limited as they assume rigid functional relationships for the variances. Moreover, function parameters are usually learned using maximum likelihood, which can lead to overfitting. To address these problems we introduce a novel model for time-changing variances using Gaussian Processes. A Gaussian Process (GP) defines a distribution over functions, which allows us to capture highly flexible functional relationships for the variances. In addition, we develop an online algorithm to perform inference. The algorithm has two main advantages. First, it takes a Bayesian approach, thereby avoiding overfitting. Second, it is much quicker than current offline inference procedures. Finally, our new model was evaluated on financial data and showed significant improvement in predictive performance over current standard models.


A Survey on Metric Learning for Feature Vectors and Structured Data

arXiv.org Machine Learning

The need for appropriate ways to measure the distance or similarity between data is ubiquitous in machine learning, pattern recognition and data mining, but handcrafting such good metrics for specific problems is generally difficult. This has led to the emergence of metric learning, which aims at automatically learning a metric from data and has attracted a lot of interest in machine learning and related fields for the past ten years. This survey paper proposes a systematic review of the metric learning literature, highlighting the pros and cons of each approach. We pay particular attention to Mahalanobis distance metric learning, a well-studied and successful framework, but additionally present a wide range of methods that have recently emerged as powerful alternatives, including nonlinear metric learning, similarity learning and local metric learning. Recent trends and extensions, such as semi-supervised metric learning, metric learning for histogram data and the derivation of generalization guarantees, are also covered. Finally, this survey addresses metric learning for structured data, in particular edit distance learning, and attempts to give an overview of the remaining challenges in metric learning for the years to come.


Prediction with Missing Data via Bayesian Additive Regression Trees

arXiv.org Machine Learning

This article addresses prediction problems where covariate information is missing during model construction and is also missing in future observations for which we are obligated to generate a forecast. Our aim is to innovate a nonparametric statistical learning extension which incorporates missingness into both the training and the forecasting phases. In the spirit of nonparametric learning, we wish to incorporate the missingness in both phases automatically, without the need for pre-specified modeling. We limit our focus to tree-based statistical learning, which has demonstrated strong predictive performance and has consequently received considerable attention in recent years. State-of-the-art algorithms include Random Forests (RF, Breiman, 2001b), stochastic gradient boosting (Friedman, 2002), and Bayesian Additive and Regression Trees (BART, Chipman et al., 2010), the algorithm of interest in this study.


Bayesian Inference with Posterior Regularization and applications to Infinite Latent SVMs

arXiv.org Artificial Intelligence

Existing Bayesian models, especially nonparametric Bayesian methods, rely on specially conceived priors to incorporate domain knowledge for discovering improved latent representations. While priors can affect posterior distributions through Bayes' rule, imposing posterior regularization is arguably more direct and in some cases more natural and general. In this paper, we present regularized Bayesian inference (RegBayes), a novel computational framework that performs posterior inference with a regularization term on the desired post-data posterior distribution under an information theoretical formulation. RegBayes is more flexible than the procedure that elicits expert knowledge via priors, and it covers both directed Bayesian networks and undirected Markov networks whose Bayesian formulation results in hybrid chain graph models. When the regularization is induced from a linear operator on the posterior distributions, such as the expectation operator, we present a general convex-analysis theorem to characterize the solution of RegBayes. Furthermore, we present two concrete examples of RegBayes, infinite latent support vector machines (iLSVM) and multi-task infinite latent support vector machines (MT-iLSVM), which explore the large-margin idea in combination with a nonparametric Bayesian model for discovering predictive latent features for classification and multi-task learning, respectively. We present efficient inference methods and report empirical studies on several benchmark datasets, which appear to demonstrate the merits inherited from both large-margin learning and Bayesian nonparametrics. Such results were not available until now, and contribute to push forward the interface between these two important subfields, which have been largely treated as isolated in the community.


A Procedural Characterization of Solution Concepts in Games

Journal of Artificial Intelligence Research

We show how game-theoretic solution concepts such as Nash equilibrium, correlated equilibrium, rationalizability, and sequential equilibrium can be given a uniform definition in terms of a knowledge-based program with counterfactual semantics. In a precise sense, this program can be viewed as providing a procedural characterization of rationality.