Uncertainty
Background to Qualitative Decision Theory
Doyle, Jon, Thomason, Richmond H.
This article provides an overview of the field of qualitative decision theory: its motivating tasks and issues, its antecedents, and its prospects. Qualitative decision theory studies qualitative approaches to problems of decision making and their sound and effective reconciliation and integration with quantitative approaches. Although it inherits from a long tradition, the field offers a new focus on a number of important unanswered questions of common concern to AI, economics, law, psychology, and management.
An Overview of Some Recent Developments in Bayesian Problem-Solving Techniques
The last few years have seen a surge in interest in the use of techniques from Bayesian decision theory to address problems in AI. Decision theory provides a normative framework for representing and reasoning about decision problems under uncertainty. Within the context of this framework, researchers in uncertainty in the AI community have been developing computational techniques for building rational agents and representations suited to engineering their knowledge bases. This special issue reviews recent research in Bayesian problem-solving techniques. The articles cover the topics of inference in Bayesian networks, decision-theoretic planning, and qualitative decision theory. Here, I provide a brief introduction to Bayesian networks and then cover applications of Bayesian problem-solving techniques, knowledge-based model construction and structured representations, and the learning of graphic probability models.
Variational Probabilistic Inference and the QMR-DT Network
Jaakkola, T. S., Jordan, M. I.
We describe a variational approximation method for efficient inference in large-scale probabilistic models. Variational methods are deterministic procedures that provide approximations to marginal and conditional probabilities of interest. They provide alternatives to approximate inference methods based on stochastic sampling or search. We describe a variational approach to the problem of diagnostic inference in the `Quick Medical Reference' (QMR) network. The QMR network is a large-scale probabilistic graphical model built on statistical and expert knowledge. Exact probabilistic inference is infeasible in this model for all but a small set of cases. We evaluate our variational inference algorithm on a large set of diagnostic test cases, comparing the algorithm to a state-of-the-art stochastic sampling method.
Minimum Description Length Induction, Bayesianism, and Kolmogorov Complexity
The relationship between the Bayesian approach and the minimum description length approach is established. We sharpen and clarify the general modeling principles MDL and MML, abstracted as the ideal MDL principle and defined from Bayes's rule by means of Kolmogorov complexity. The basic condition under which the ideal principle should be applied is encapsulated as the Fundamental Inequality, which in broad terms states that the principle is valid when the data are random, relative to every contemplated hypothesis and also these hypotheses are random relative to the (universal) prior. Basically, the ideal principle states that the prior probability associated with the hypothesis should be given by the algorithmic universal probability, and the sum of the log universal probability of the model plus the log of the probability of the data given the model should be minimized. If we restrict the model class to the finite sets then application of the ideal principle turns into Kolmogorov's minimal sufficient statistic. In general we show that data compression is almost always the best strategy, both in hypothesis identification and prediction.
Ensemble Learning for Multi-Layer Networks
Barber, David, Bishop, Christopher M.
In contrast to the maximum likelihood approach which finds only a single estimate for the regression parameters, the Bayesian approach yields a distribution of weight parameters, p(wID), conditional on the training data D, and predictions are ex- ·Present address: SNN, University of Nijmegen, Geert Grooteplein 21, Nijmegen, The Netherlands.
Regularisation in Sequential Learning Algorithms
Freitas, João F. G. de, Niranjan, Mahesan, Gee, Andrew H.
In this paper, we discuss regularisation in online/sequential learning algorithms. In environments where data arrives sequentially, techniques such as cross-validation to achieve regularisation or model selection are not possible. Further, bootstrapping to determine a confidence level is not practical. To surmount these problems, a minimum variance estimation approach that makes use of the extended Kalman algorithm for training multi-layer perceptrons is employed. The novel contribution of this paper is to show the theoretical links between extended Kalman filtering, Sutton's variable learning rate algorithms and Mackay's Bayesian estimation framework. In doing so, we propose algorithms to overcome the need for heuristic choices of the initial conditions and noise covariance matrices in the Kalman approach.
Experiences with Bayesian Learning in a Real World Application
Sykacek, Peter, Dorffner, Georg, Rappelsberger, Peter, Zeitlhofer, Josef
This paper reports about an application of Bayes' inferred neural network classifiers in the field of automatic sleep staging. The reason for using Bayesian learning for this task is twofold. First, Bayesian inference is known to embody regularization automatically. Second, a side effect of Bayesian learning leads to larger variance of network outputs in regions without training data. This results in well known moderation effects, which can be used to detect outliers. In a 5 fold cross-validation experiment the full Bayesian solution found with R. Neals hybrid Monte Carlo algorithm, was not better than a single maximum a-posteriori (MAP) solution found with D.J. MacKay's evidence approximation. In a second experiment we studied the properties of both solutions in rejecting classification of movement artefacts.
Nonlinear Markov Networks for Continuous Variables
Hofmann, Reimar, Tresp, Volker
We address the problem oflearning structure in nonlinear Markov networks with continuous variables. This can be viewed as non-Gaussian multidimensional density estimation exploiting certain conditional independencies in the variables. Markov networks are a graphical way of describing conditional independencies well suited to model relationships which do not exhibit a natural causal ordering. We use neural network structures to model the quantitative relationships between variables. The main focus in this paper will be on learning the structure for the purpose of gaining insight into the underlying process. Using two data sets we show that interesting structures can be found using our approach. Inference will be briefly addressed.
Radial Basis Functions: A Bayesian Treatment
Barber, David, Schottky, Bernhard
Bayesian methods have been successfully applied to regression and classification problems in multi-layer perceptrons. We present a novel application of Bayesian techniques to Radial Basis Function networks by developing a Gaussian approximation to the posterior distribution which, for fixed basis function widths, is analytic in the parameters. The setting of regularization constants by crossvalidation is wasteful as only a single optimal parameter estimate is retained. We treat this issue by assigning prior distributions to these constants, which are then adapted in light of the data under a simple re-estimation formula. 1 Introduction Radial Basis Function networks are popular regression and classification tools[lO]. For fixed basis function centers, RBFs are linear in their parameters and can therefore be trained with simple one shot linear algebra techniques[lO]. The use of unsupervised techniques to fix the basis function centers is, however, not generally optimal since setting the basis function centers using density estimation on the input data alone takes no account of the target values associated with that data. Ideally, therefore, we should include the target values in the training procedure[7, 3, 9]. Unfortunately, allowing centers to adapt to the training targets leads to the RBF being a nonlinear function of its parameters, and training becomes more problematic. Most methods that perform supervised training of RBF parameters minimize the ·Present address: SNN, University of Nijmegen, Geert Grooteplein 21, Nijmegen, The Netherlands.