Goto

Collaborating Authors

 Bayesian Inference


A Note on Posterior Probability Estimation for Classifiers

arXiv.org Machine Learning

One of the central themes in the classification task is the estimation of class posterior probability at a new point $\bf{x}$. The vast majority of classifiers output a score for $\bf{x}$, which is monotonically related to the posterior probability via an unknown relationship. There are many attempts in the literature to estimate this latter relationship. Here, we provide a way to estimate the posterior probability without resorting to using classification scores. Instead, we vary the prior probabilities of classes in order to derive the ratio of pdf's at point $\bf{x}$, which is directly used to determine class posterior probabilities. We consider here the binary classification problem.


Regularized Estimation and Feature Selection in Mixtures of Gaussian-Gated Experts Models

arXiv.org Machine Learning

Mixtures-of-Experts models and their maximum likelihood estimation (MLE) via the EM algorithm have been thoroughly studied in the statistics and machine learning literature. They are subject of a growing investigation in the context of modeling with high-dimensional predictors with regularized MLE. We examine MoE with Gaussian gating network, for clustering and regression, and propose an $\ell_1$-regularized MLE to encourage sparse models and deal with the high-dimensional setting. We develop an EM-Lasso algorithm to perform parameter estimation and utilize a BIC-like criterion to select the model parameters, including the sparsity tuning hyperparameters. Experiments conducted on simulated data show the good performance of the proposed regularized MLE compared to the standard MLE with the EM algorithm.


Modular Meta-Learning with Shrinkage

arXiv.org Artificial Intelligence

Most gradient-based approaches to meta-learning do not explicitly account for the fact that different parts of the underlying model adapt by different amounts when applied to a new task. For example, the input layers of an image classification convnet typically adapt very little, while the output layers can change significantly. This can cause parts of the model to begin to overfit while others underfit. To address this, we introduce a hierarchical Bayesian model with per-module shrinkage parameters, which we propose to learn by maximizing an approximation of the predictive likelihood using implicit differentiation. Our algorithm subsumes Reptile and outperforms variants of MAML on two synthetic few-shot meta-learning problems.


Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning

arXiv.org Artificial Intelligence

While most approaches to the problem of Inverse Reinforcement Learning (IRL) focus on estimating a reward function that best explains an expert agent's policy or demonstrated behavior on a control task, it is often the case that such behavior is more succinctly described by a simple reward combined with a set of hard constraints. In this setting, the agent is attempting to maximize cumulative rewards subject to these given constraints on their behavior. We reformulate the problem of IRL on Markov Decision Processes (MDPs) such that, given a nominal model of the environment and a nominal reward function, we seek to estimate state, action, and feature constraints in the environment that motivate an agent's behavior. Our approach is based on the Maximum Entropy IRL framework, which allows us to reason about the likelihood of an expert agent's demonstrations given our knowledge of an MDP. Using our method, we can infer which constraints can be added to the MDP to most increase the likelihood of observing these demonstrations. We present an algorithm which iteratively infers the Maximum Likelihood Constraint to best explain observed behavior, and we evaluate its efficacy using both simulated behavior and recorded data of humans navigating around an obstacle.


5 Reasons to Learn Probability for Machine Learning

#artificialintelligence

Probability is a field of mathematics that quantifies uncertainty. It is undeniably a pillar of the field of machine learning, and many recommend it as a prerequisite subject to study prior to getting started. This is misleading advice, as probability makes more sense to a practitioner once they have the context of the applied machine learning process in which to interpret it. In this post, you will discover why machine learning practitioners should study probabilities to improve their skills and capabilities. Before we go through the reasons that you should learn probability, let's start off by taking a small look at the reason why you should not.


An Online Reinforcement Learning Approach to Quality-Cost-Aware Task Allocation for Multi-Attribute Social Sensing

arXiv.org Machine Learning

Social sensing has emerged as a new sensing paradigm where humans (or devices on their behalf) collectively report measurements about the physical world. This paper focuses on a quality-cost-aware task allocation problem in multi-attribute social sensing applications. The goal is to identify a task allocation strategy (i.e., decide when and where to collect sensing data) to achieve an optimized tradeoff between the data quality and the sensing cost. While recent progress has been made to tackle similar problems, three important challenges have not been well addressed: (i) "online task allocation": the task allocation schemes need to respond quickly to the potentially large dynamics of the measured variables in social sensing; (ii) "multi-attribute constrained optimization": minimizing the overall sensing error given the dependencies and constraints of multiple attributes of the measured variables is a nontrivial problem to solve; (iii) "nonuniform task allocation cost": the task allocation cost in social sensing often has a nonuniform distribution which adds additional complexity to the optimized task allocation problem. We evaluate the QCO-TA scheme through a real-world social sensing application and the results show that our scheme significantly outperforms the state-of-the-art baselines in terms of both sensing accuracy and cost. Introduction This paper presents an online reinforcement learning framework to solve the quality-cost-aware task allocation problem in multi-attribute social sensing applications. Social sensing has emerged as a new sensing paradigm in pervasive and mobile computing applications where humans (or devices on their behalf) collectively report measurements about the physical world [1, 2]. Examples of social sensing applications include air quality and environment monitoring in smart cities using mobile devices [3], malfunctioning urban infrastructures reporting using geotagging [4], and damage assessment in disaster response using online social media [5]. In social sensing applications, participants perform sensing tasks at assigned locations to collect different attributes of the measured variables that are of interests to the application [6]. For example, in an urban air quality sensing application, participants are tasked to measure various air quality attributes (e.g., PM 2.5, PM 10, CO 2) at different locations of the city to estimate the overall air quality and identity potential health risks. We refer to this category of applications as multi-attribute social sensing applications . In multi-attribute social sensing applications, there exists a fundamental tradeoff between data quality and sensing (task allocation) cost [3, 7]. 2 In particular, it is essential to obtain comprehensive and accurate measurements to ensure the desired data quality of the social sensing applications.


Correcting Predictions for Approximate Bayesian Inference

arXiv.org Machine Learning

Bayesian models quantify uncertainty and facilitate optimal decision-making in downstream applications. For most models, however, practitioners are forced to use approximate inference techniques that lead to sub-optimal decisions due to incorrect posterior predictive distributions. We present a novel approach that corrects for inaccuracies in posterior inference by altering the decision-making process. We train a separate model to make optimal decisions under the approximate posterior, combining interpretable Bayesian modeling with optimization of direct predictive accuracy in a principled fashion. The solution is generally applicable as a plug-in module for predictive decision-making for arbitrary probabilistic programs, irrespective of the posterior inference strategy. We demonstrate the approach empirically in several problems, confirming its potential.


Efficient Bayesian synthetic likelihood with whitening transformations

arXiv.org Machine Learning

Likelihood-free methods are an established approach for performing approximate Bayesian inference for models with intractable likelihood functions. However, they can be computationally demanding. Bayesian synthetic likelihood (BSL) is a popular such method that approximates the likelihood function of the summary statistic with a known, tractable distribution -- typically Gaussian -- and then performs statistical inference using standard likelihood-based techniques. However, as the number of summary statistics grows, the number of model simulations required to accurately estimate the covariance matrix for this likelihood rapidly increases. This poses significant challenge for the application of BSL, especially in cases where model simulation is expensive. In this article we propose whitening BSL (wBSL) -- an efficient BSL method that uses approximate whitening transformations to decorrelate the summary statistics at each algorithm iteration. We show empirically that this can reduce the number of model simulations required to implement BSL by more than an order of magnitude, without much loss of accuracy. We explore a range of whitening procedures and demonstrate the performance of wBSL on a range of simulated and real modelling scenarios from ecology and biology.


Correlation Priors for Reinforcement Learning

arXiv.org Artificial Intelligence

Many decision-making problems naturally exhibit pronounced structures inherited from the underlying characteristics of the environment. In a Markov decision process model, for example, two distinct states can have inherently related semantics or encode resembling physical state configurations, often implying locally correlated transition dynamics among the states. In order to complete a certain task, an agent acting in such environments needs to execute a series of temporally and spatially correlated actions. Though there exists a variety of approaches to account for correlations in continuous state-action domains, a principled solution for discrete environments is missing. In this work, we present a Bayesian learning framework based on P\'olya-Gamma augmentation that enables an analogous reasoning in such cases. We demonstrate the framework on a number of common decision-making related tasks, such as reinforcement learning, imitation learning and system identification. By explicitly modeling the underlying correlation structures, the proposed approach yields superior predictive performance compared to correlation-agnostic models, even when trained on data sets that are up to an order of magnitude smaller in size.


Scalable Structure Learning of Continuous-Time Bayesian Networks from Incomplete Data

arXiv.org Machine Learning

Continuous-time Bayesian Networks (CTBNs) represent a compact yet powerful framework for understanding multivariate time-series data. Given complete data, parameters and structure can be estimated efficiently in closed-form. However, if data is incomplete, the latent states of the CTBN have to be estimated by laboriously simulating the intractable dynamics of the assumed CTBN. This is a problem, especially for structure learning tasks, where this has to be done for each element of super-exponentially growing set of possible structures. In order to circumvent this notorious bottleneck, we develop a novel gradient-based approach to structure learning. Instead of sampling and scoring all possible structures individually, we assume the generator of the CTBN to be composed as a mixture of generators stemming from different structures. In this framework, structure learning can be performed via a gradient-based optimization of mixture weights. We combine this approach with a novel variational method that allows for the calculation of the marginal likelihood of a mixture in closed-form. We proof the scalability of our method by learning structures of previously inaccessible sizes from synthetic and real-world data.