Bayesian Inference
Elicitation of Factored Utilities
We provide a brief overview of recent direct preference elicitation methods: these methods ask users to answer (ideally, a small number of) queries regarding their preferences and use this information to recommend a feasible decision that would be (approximately) optimal given those preferences. We argue for the importance of assessing numerical utilities rather than qualitative preferences and survey several utility elicitation techniques from artificial intelligence, operations research, and conjoint analysis. Specifically, since the ability to make reasonable decisions on behalf of a user depends on that user's preferences over outcomes in the domain in question, AI systems must assess or estimate these preferences before making decisions. Designing effective preference assessment techniques to incorporate such user-specific considerations (that is, breaking the preference bottleneck) is one of the most important problems facing AI. In this brief survey, we focus on explicit elicitation techniques where a system actively queries a user to glean relevant preferences. Preference elicitation is difficult for two main reasons. First, many decision problems have exponentially sized outcome spaces, defined by the possible values of outcome attributes. As an illustrative example, consider sophisticated flight selection: possible outcomes are defined by attributes such as trip cost, departure time, return time, airline, number of connections, flight length, baggage weight limit, flight class, (the possibility of) lost luggage, flight delays, and other stochastic outcomes. An ideal decision support system should be able to use, for example, precise flight delay statistics and incorporate a user's relative tolerance for delays in making recommendations. Representing and eliciting preferences for all outcomes in a case like this is infeasible given the size of the outcome space. A second difficulty arises due to the fact that quantitative strength of preferences, or utility, is needed to trade off, for instance, the odds of flight delays with other attributes. Unfortunately, people are notoriously inept at quantifying their preferences with any degree of precision, adding to the challenges facing automated utility elicitation.
Decision Making in Complex Multiagent Contexts: A Tale of Two Frameworks
It involves choosing optimally between different lines of action in various information contexts that range from perfectly knowing all aspects of the decision problem to having just partial knowledge about it. The physical context often includes other interacting autonomous systems, typically called agents. In this article, I focus on decision making in a multiagent context with partial information about the problem. Relevant research in this complex but realistic setting has converged around two complementary, general frameworks and also introduced myriad specializations on its way. I put the two frameworks, decentralized partially observable Markov decision process (Dec-POMDP) and the interactive partially observable Markov decision process (I-POMDP), in context and review the foundational algorithms for these frameworks, while briefly discussing the advances in their specializations.
…making Bayesian networks more accessible to the probabilistically unsophisticated
Over the last few years, a method of reasoning using probabilities, variously called belief networks, Bayesian networks, knowledge maps, probabilistic causal networks, and so on, has become popular within the AI probability and uncertainty community. This method is best summarized in Judea Pearl's (1988) book, but the ideas are a product of many hands. I adopted Pearl's name, Bayesian networks, on the grounds that the name is completely neutral about the status of the networks (do they really represent beliefs, causality, or what?). I give an introduction to Bayesian networks for AI researchers with a limited grounding in probability theory. Over the last few years, this method of reasoning using probabilities has become popular within the AI probability and uncertainty community.
An Overview of Some Recent Developments in Bayesian Problem-Solving Techniques
The last few years have seen a surge in interest in the use of techniques from Bayesian decision theory to address problems in AI. Decision theory provides a normative framework for representing and reasoning about decision problems under uncertainty. Within the context of this framework, researchers in uncertainty in the AI community have been developing computational techniques for building rational agents and representations suited to engineering their knowledge bases. The articles cover the topics of inference in Bayesian networks, decision-theoretic planning, and qualitative decision theory. Here, I provide a brief introduction to Bayesian networks and then cover applications of Bayesian problem-solving techniques, knowledge-based model construction and structured representations, and the learning of graphic probability models.
Sequential Decision Making in Computational Sustainability Through Adaptive Submodularity
Such problems are generally notoriously difficult. In this article, we review the recently discovered notion of adaptive submodularity, an intuitive diminishing returns condition that generalizes the classical notion of submodular set functions to sequential decision problems. Problems exhibiting the adaptive submodularity property can be efficiently and provably nearoptimally solved using simple myopic policies. We illustrate this concept in several case studies of interest in computational sustainability: First, we demonstrate how it can be used to efficiently plan for resolving uncertainty in adaptive management scenarios. Then, we show how it applies to dynamic conservation planning for protecting endangered species, a case study carried out in collaboration with the U.S. Geological Survey and the U.S. Fish and Wildlife Service.
PHOENICS: A universal deep Bayesian optimizer
Häse, Florian, Roch, Loïc M., Kreisbeck, Christoph, Aspuru-Guzik, Alán
In this work we introduce PHOENICS, a probabilistic global optimization algorithm combining ideas from Bayesian optimization with concepts from Bayesian kernel density estimation. We propose an inexpensive acquisition function balancing the explorative and exploitative behavior of the algorithm. This acquisition function enables intuitive sampling strategies for an efficient parallel search of global minima. The performance of PHOENICS is assessed via an exhaustive benchmark study on a set of 15 discrete, quasi-discrete and continuous multidimensional functions. Unlike optimization methods based on Gaussian processes (GP) and random forests (RF), we show that PHOENICS is less sensitive to the nature of the co-domain, and outperforms GP and RF optimizations. We illustrate the performance of PHOENICS on the Oregonator, a difficult case-study describing a complex chemical reaction network. We demonstrate that only PHOENICS was able to reproduce qualitatively and quantitatively the target dynamic behavior of this nonlinear reaction dynamics. We recommend PHOENICS for rapid optimization of scalar, possibly non-convex, black-box unknown objective functions.
Bayesian Estimation of Multidimensional Latent Variables and Its Asymptotic Accuracy
Hierarchical learning models, such as mixture models and Bayesian networks, are widely employed for unsupervised learning tasks, such as clustering analysis. They consist of observable and hidden variables, which represent the given data and their hidden generation process, respectively. It has been pointed out that conventional statistical analysis is not applicable to these models, because redundancy of the latent variable produces singularities in the parameter space. In recent years, a method based on algebraic geometry has allowed us to analyze the accuracy of predicting observable variables when using Bayesian estimation. However, how to analyze latent variables has not been sufficiently studied, even though one of the main issues in unsupervised learning is to determine how accurately the latent variable is estimated. A previous study proposed a method that can be used when the range of the latent variable is redundant compared with the model generating data. The present paper extends that method to the situation in which the latent variables have redundant dimensions. We formulate new error functions and derive their asymptotic forms. Calculation of the error functions is demonstrated in two-layered Bayesian networks.
Flow-GAN: Combining Maximum Likelihood and Adversarial Learning in Generative Models
Grover, Aditya, Dhar, Manik, Ermon, Stefano
Adversarial learning of probabilistic models has recently emerged as a promising alternative to maximum likelihood. Implicit models such as generative adversarial networks (GAN) often generate better samples compared to explicit models trained by maximum likelihood. Yet, GANs sidestep the characterization of an explicit density which makes quantitative evaluations challenging. To bridge this gap, we propose Flow-GANs, a generative adversarial network for which we can perform exact likelihood evaluation, thus supporting both adversarial and maximum likelihood training. When trained adversarially, Flow-GANs generate high-quality samples but attain extremely poor log-likelihood scores, inferior even to a mixture model memorizing the training data; the opposite is true when trained by maximum likelihood. Results on MNIST and CIFAR-10 demonstrate that hybrid training can attain high held-out likelihoods while retaining visual fidelity in the generated samples.
Probabilistic supervised learning
Gressmann, Frithjof, Király, Franz J., Mateen, Bilal, Oberhauser, Harald
Predictive modelling and supervised learning are central to modern data science. With predictions from an ever-expanding number of supervised black-box strategies - e.g., kernel methods, random forests, deep learning aka neural networks - being employed as a basis for decision making processes, it is crucial to understand the statistical uncertainty associated with these predictions. As a general means to approach the issue, we present an overarching framework for black-box prediction strategies that not only predict the target but also their own predictions' uncertainty. Moreover, the framework allows for fair assessment and comparison of disparate prediction strategies. For this, we formally consider strategies capable of predicting full distributions from feature variables, so-called probabilistic supervised learning strategies. Our work draws from prior work including Bayesian statistics, information theory, and modern supervised machine learning, and in a novel synthesis leads to (a) new theoretical insights such as a probabilistic bias-variance decomposition and an entropic formulation of prediction, as well as to (b) new algorithms and meta-algorithms, such as composite prediction strategies, probabilistic boosting and bagging, and a probabilistic predictive independence test. Our black-box formulation also leads (c) to a new modular interface view on probabilistic supervised learning and a modelling workflow API design, which we have implemented in the newly released skpro machine learning toolbox, extending the familiar modelling interface and meta-modelling functionality of sklearn. The skpro package provides interfaces for construction, composition, and tuning of probabilistic supervised learning strategies, together with orchestration features for validation and comparison of any such strategy - be it frequentist, Bayesian, or other.
A Unified Framework for Sparse Non-Negative Least Squares using Multiplicative Updates and the Non-Negative Matrix Factorization Problem
Fedorov, Igor, Nalci, Alican, Giri, Ritwik, Rao, Bhaskar D., Nguyen, Truong Q., Garudadri, Harinath
We study the sparse non-negative least squares (S-NNLS) problem. S-NNLS occurs naturally in a wide variety of applications where an unknown, non-negative quantity must be recovered from linear measurements. We present a unified framework for S-NNLS based on a rectified power exponential scale mixture prior on the sparse codes. We show that the proposed framework encompasses a large class of S-NNLS algorithms and provide a computationally efficient inference procedure based on multiplicative update rules. Such update rules are convenient for solving large sets of S-NNLS problems simultaneously, which is required in contexts like sparse non-negative matrix factorization (S-NMF). We provide theoretical justification for the proposed approach by showing that the local minima of the objective function being optimized are sparse and the S-NNLS algorithms presented are guaranteed to converge to a set of stationary points of the objective function. We then extend our framework to S-NMF, showing that our framework leads to many well known S-NMF algorithms under specific choices of prior and providing a guarantee that a popular subclass of the proposed algorithms converges to a set of stationary points of the objective function. Finally, we study the performance of the proposed approaches on synthetic and real-world data.