Werling, Keenon, Chaganty, Arun Tejasvi, Liang, Percy S., Manning, Christopher D.

Our goal is to deploy a high-accuracy system starting with zero training examples. We consider an "on-the-job" setting, where as inputs arrive, we use real-time crowdsourcing to resolve uncertainty where needed and output our prediction when confident. As the model improves over time, the reliance on crowdsourcing queries decreases. We cast our setting as a stochastic game based on Bayesian decision theory, which allows us to balance latency, cost, and accuracy objectives in a principled way. Computing the optimal policy is intractable, so we develop an approximation based on Monte Carlo Tree Search.

Cobb, Adam D., Roberts, Stephen J., Gal, Yarin

Current approaches in approximate inference for Bayesian neural networks minimise the Kullback-Leibler divergence to approximate the true posterior over the weights. However, this approximation is without knowledge of the final application, and therefore cannot guarantee optimal predictions for a given task. To make more suitable task-specific approximations, we introduce a new loss-calibrated evidence lower bound for Bayesian neural networks in the context of supervised learning, informed by Bayesian decision theory. By introducing a lower bound that depends on a utility function, we ensure that our approximation achieves higher utility than traditional methods for applications that have asymmetric utility functions. Furthermore, in using dropout inference, we highlight that our new objective is identical to that of standard dropout neural networks, with an additional utility-dependent penalty term. We demonstrate our new loss-calibrated model with an illustrative medical example and a restricted model capacity experiment, and highlight failure modes of the comparable weighted cross entropy approach. Lastly, we demonstrate the scalability of our method to real world applications with per-pixel semantic segmentation on an autonomous driving data set.

The use of formal statistical methods to analyse quantitative data in data science has increased considerably over the last few years. One such approach, Bayesian Decision Theory (BDT), also known as Bayesian Hypothesis Testing and Bayesian inference, is a fundamental statistical approach that quantifies the tradeoffs between various decisions using distributions and costs that accompany such decisions. In pattern recognition it is used for designing classifiers making the assumption that the problem is posed in probabilistic terms, and that all of the relevant probability values are known. Generally, we don't have such perfect information but it is a good place to start when studying machine learning, statistical inference, and detection theory in signal processing. BDT also has many applications in science, engineering, and medicine.

So how do we determine the "best" decision? This requires that we first define some notion of what we want (what are we trying to do?). The formal object that we use to do this goes by many names depending on the field: I will refer to it as a Loss function (\(\mathcal{L}\)) but the same general concept may be alternatively called a cost function, a utility function, an acquisition function, or any number of different things. The crucial idea is that this is a function that allows us to quantify how bad/good a given decision (\(a\)) is given some information (\(\theta\)). What does it mean to quantify?

Werling, Keenon, Chaganty, Arun Tejasvi, Liang, Percy S., Manning, Christopher D.

Our goal is to deploy a high-accuracy system starting with zero training examples. We consider an “on-the-job” setting, where as inputs arrive, we use real-time crowdsourcing to resolve uncertainty where needed and output our prediction when confident. As the model improves over time, the reliance on crowdsourcing queries decreases. We cast our setting as a stochastic game based on Bayesian decision theory, which allows us to balance latency, cost, and accuracy objectives in a principled way. Computing the optimal policy is intractable, so we develop an approximation based on Monte Carlo Tree Search. We tested our approach on three datasets-- named-entity recognition, sentiment classification, and image classification. On the NER task we obtained more than an order of magnitude reduction in cost compared to full human annotation, while boosting performance relative to the expert provided labels. We also achieve a 8% F1 improvement over having a single human label the whole set, and a 28% F1 improvement over online learning.

Werling, Keenon, Chaganty, Arun, Liang, Percy, Manning, Chris

Our goal is to deploy a high-accuracy system starting with zero training examples. We consider an "on-the-job" setting, where as inputs arrive, we use real-time crowdsourcing to resolve uncertainty where needed and output our prediction when confident. As the model improves over time, the reliance on crowdsourcing queries decreases. We cast our setting as a stochastic game based on Bayesian decision theory, which allows us to balance latency, cost, and accuracy objectives in a principled way. Computing the optimal policy is intractable, so we develop an approximation based on Monte Carlo Tree Search. We tested our approach on three datasets---named-entity recognition, sentiment classification, and image classification. On the NER task we obtained more than an order of magnitude reduction in cost compared to full human annotation, while boosting performance relative to the expert provided labels. We also achieve a 8% F1 improvement over having a single human label the whole set, and a 28% F1 improvement over online learning.

Abbasnejad, Ehsan (Australian National University and NICTA)

Decision theory focuses on the problem of making decisions under uncertainty. This uncertainty arises from the unknown aspects of the state of the world the decision maker is in or the unknown utility function of performing actions. The uncertainty can be modeled as a probability distribution capturing our belief about the world the decision maker is in. Upon making new observations, the decision maker becomes more confident about this model. In addition, if there is a prior belief on this uncertainty that may have obtained from similar experiments, the Bayesian methods may be employed. The loss incurred by the decision maker can also be utilized for the optimal action selection. Most machine learning algorithms developed though focus on one of these aspects for learning and prediction; either learning the probabilistic model or minimizing the loss. In probabilistic models, approximate inference, the process of obtaining the desired model from the observations when its is not tractable, does not consider the task loss. On the other end of the spectrum, the common practice in learning is to minimize the task loss without considering the uncertainty of prediction model. Therefore, we investigate the intersection of decision theory and machine learning considering both uncertainty in prediction model and the task loss.

Autonomous vacuuming agents are no exception. Under the assumptions that the physical architectures for such agents include an on-board, self contained power supply, (to avoid the cord-entanglement problem), and sensors which may be prone to failure and noise, the basic problem is one of tracking and resource management. An accurate and timely representation of critical parameters and system state must be maintained if the task is to be accomplished with any efficiency or reliability. Bayesian techniques provide the most effective and accurate means to maintain this information. Most of these techniques were developed in the early 1960's.