Bayesian Learning
A Disaster Response System based on Human-Agent Collectives
Ramchurn, Sarvapali D., Huynh, Trung Dong, Wu, Feng, Ikuno, Yukki, Flann, Jack, Moreau, Luc, Fischer, Joel E., Jiang, Wenchao, Rodden, Tom, Simpson, Edwin, Reece, Steven, Roberts, Stephen, Jennings, Nicholas R.
Major natural or man-made disasters such as Hurricane Katrina or the 9/11 terror attacks pose significant challenges for emergency responders. First, they have to develop an understanding of the unfolding event either using their own resources or through third-parties such as the local population and agencies. Second, based on the information gathered, they need to deploy their teams in a flexible manner, ensuring that each team performs tasks in The most effective way. Third, given the dynamic nature of a disaster space, and the uncertainties involved in performing rescue missions, information about the disaster space and the actors within it needs to be managed to ensure that responders are always acting on up-to-date and trusted information. Against this background, this paper proposes a novel disaster response system called HAC-ER. Thus HAC-ER interweaves humans and agents, both robotic and software, in social relationships that augment their individual and collective capabilities. To design HAC-ER, we involved end-users including both experts and volunteers in a several participatory design workshops, lab studies, and field trials of increasingly advanced prototypes of individual components of HAC-ER as well as the overall system. This process generated a number of new quantitative and qualitative results but also raised a number of new research questions. HAC-ER thus demonstrates how such Human-Agent Collectives (HACs) can address key challenges in disaster response. Specifically, we show how HAC-ER utilises crowdsourcing combined with machine learning to obtain most important situational awareness from large streams of reports posted by members of the public and trusted organisations. We then show how this information can inform human-agent teams in coordinating multi-UAV deployments, as well as task planning for responders on the ground. Finally, HAC-ER incorporates an infrastructure and the associated intelligence for tracking and utilising the provenance of information shared across the entire system to ensure its accountability. We individually validate each of these elements of HAC-ER and show how they perform against standard (non-HAC) baselines and also elaborate on the evaluation of the overall system.
Algorithms and matching lower bounds for approximately-convex optimization
In recent years, a rapidly increasing number of applications in practice requires solving non-convex objectives, like training neural networks, learning graphical models, maximum likelihood estimation etc. Though simple heuristics such as gradient descent with very few modifications tend to work well, theoretical understanding is very weak. We consider possibly the most natural class of non-convex functions where one could hope to obtain provable guarantees: functions that are ``approximately convex'', i.e. functions $\tf: \Real^d \to \Real$ for which there exists a \emph{convex function} $f$ such that for all $x$, $|\tf(x) - f(x)| \le \errnoise$ for a fixed value $\errnoise$. We then want to minimize $\tf$, i.e. output a point $\tx$ such that $\tf(\tx) \le \min_{x} \tf(x) + \err$. It is quite natural to conjecture that for fixed $\err$, the problem gets harder for larger $\errnoise$, however, the exact dependency of $\err$ and $\errnoise$ is not known. In this paper, we strengthen the known \emph{information theoretic} lower bounds on the trade-off between $\err$ and $\errnoise$ substantially, and exhibit an algorithm that matches these lower bounds for a large class of convex bodies.
The Forget-me-not Process
Milan, Kieran, Veness, Joel, Kirkpatrick, James, Bowling, Michael, Koop, Anna, Hassabis, Demis
We introduce the Forget-me-not Process, an efficient, non-parametric meta-algorithm for online probabilistic sequence prediction for piecewise stationary, repeating sources. Our method works by taking a Bayesian approach to partition a stream of data into postulated task-specific segments, while simultaneously building a model for each task. We provide regret guarantees with respect to piecewise stationary data sources under the logarithmic loss, and validate the method empirically across a range of sequence prediction and task identification problems.
Learning Infinite RBMs with Frank-Wolfe
Ping, Wei, Liu, Qiang, Ihler, Alexander T.
In this work, we propose an infinite restricted Boltzmann machine (RBM), whose maximum likelihood estimation (MLE) corresponds to a constrained convex optimization. We consider the Frank-Wolfe algorithm to solve the program, which provides a sparse solution that can be interpreted as inserting a hidden unit at each iteration, so that the optimization process takes the form of a sequence of finite models of increasing complexity. As a side benefit, this can be used to easily and efficiently identify an appropriate number of hidden units during the optimization. The resulting model can also be used as an initialization for typical state-of-the-art RBM training algorithms such as contrastive divergence, leading to models with consistently higher test likelihood than random initialization.
Gaussian Processes for Survival Analysis
Fernandez, Tamara, Rivera, Nicolas, Teh, Yee Whye
We introduce a semi-parametric Bayesian model for survival analysis. The model is centred on a parametric baseline hazard, and uses a Gaussian process to model variations away from it nonparametrically, as well as dependence on covariates. As opposed to many other methods in survival analysis, our framework does not impose unnecessary constraints in the hazard rate or in the survival function. Furthermore, our model handles left, right and interval censoring mechanisms common in survival analysis. We propose a MCMC algorithm to perform inference and an approximation scheme based on random Fourier features to make computations faster. We report experimental results on synthetic and real data, showing that our model performs better than competing models such as Cox proportional hazards, ANOVA-DDP and random survival forests.
Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making
Lakkaraju, Himabindu, Leskovec, Jure
We propose Confusions over Time (CoT), a novel generative framework which facilitates a multi-granular analysis of the decision making process. The CoT not only models the confusions or error properties of individual decision makers and their evolution over time, but also allows us to obtain diagnostic insights into the collective decision making process in an interpretable manner. To this end, the CoT models the confusions of the decision makers and their evolution over time via time-dependent confusion matrices. Interpretable insights are obtained by grouping similar decision makers (and items being judged) into clusters and representing each such cluster with an appropriate prototype and identifying the most important features characterizing the cluster via a subspace feature indicator vector. Experimentation with real world data on bail decisions, asthma treatments, and insurance policy approval decisions demonstrates that CoT can accurately model and explain the confusions of decision makers and their evolution over time.
Fast ฮต-free Inference of Simulation Models with Bayesian Conditional Density Estimation
Papamakarios, George, Murray, Iain
Many statistical models can be simulated forwards but have intractable likelihoods. Approximate Bayesian Computation (ABC) methods are used to infer properties of these models from data. Traditionally these methods approximate the posterior over parameters by conditioning on data being inside an ฮต-ball around the observed data, which is only correct in the limit ฮตโ0. Monte Carlo methods can then draw samples from the approximate posterior to approximate predictions or error bars on parameters. These algorithms critically slow down as ฮตโ0, and in practice draw samples from a broader distribution than the posterior. We propose a new approach to likelihood-free inference based on Bayesian conditional density estimation. Preliminary inferences based on limited simulation data are used to guide later simulations. In some cases, learning an accurate parametric representation of the entire true posterior distribution requires fewer model simulations than Monte Carlo ABC methods need to produce a single sample from an approximate posterior.
One-vs-Each Approximation to Softmax for Scalable Estimation of Probabilities
The softmax representation of probabilities for categorical variables plays a prominent role in modern machine learning with numerous applications in areas such as large scale classification, neural language modeling and recommendation systems. However, softmax estimation is very expensive for large scale inference because of the high cost associated with computing the normalizing constant. Here, we introduce an efficient approximation to softmax probabilities which takes the form of a rigorous lower bound on the exact probability. This bound is expressed as a product over pairwise probabilities and it leads to scalable estimation based on stochastic optimization. It allows us to perform doubly stochastic estimation by subsampling both training instances and class labels. We show that the new bound has interesting theoretical properties and we demonstrate its use in classification problems.
Causal meets Submodular: Subset Selection with Directed Information
Zhou, Yuxun, Spanos, Costas J.
We study causal subset selection with Directed Information as the measure of prediction causality. Two typical tasks, causal sensor placement and covariate selection, are correspondingly formulated into cardinality constrained directed information maximizations. To attack the NP-hard problems, we show that the first problem is submodular while not necessarily monotonic. And the second one is ``nearly'' submodular. To substantiate the idea of approximate submodularity, we introduce a novel quantity, namely submodularity index (SmI), for general set functions. Moreover, we show that based on SmI, greedy algorithm has performance guarantee for the maximization of possibly non-monotonic and non-submodular functions, justifying its usage for a much broader class of problems. We evaluate the theoretical results with several case studies, and also illustrate the application of the subset selection to causal structure learning.
Poisson-Gamma dynamical systems
Schein, Aaron, Wallach, Hanna, Zhou, Mingyuan
We introduce a new dynamical system for sequentially observed multivariate count data. This model is based on the gamma-Poisson construction--a natural choice for count data--and relies on a novel Bayesian nonparametric prior that ties and shrinks the model parameters, thus avoiding overfitting. We present an efficient MCMC inference algorithmthat advances recent work on augmentation schemes for inference in negative binomial models. Finally, we demonstrate the model's inductive bias using a variety of real-world data sets, showing that it exhibits superior predictive performance over other models and infers highly interpretable latent structure.