Uncertainty
A practical Bayesian framework for back-propagation networks
A quantitative and practical Bayesian framework is described for learning of mappings in feedforward networks. The framework makes possible (1) objective comparisons between solutions using alternative network architectures, (2) objective stopping rules for network pruning or growing procedures, (3) objective choice of magnitude and type of weight decay terms or additive regularizers (for penalizing large weights, etc.), (4) a measure of the effective number of well-determined parameters in a model, (5) quantified estimates of the error bars on network parameters and on network output, and (6) objective comparisons with alternative learning and interpolation models such as splines and radial basis functions. The Bayesian "evidence" automatically embodies "Occam's razor," penalizing overflexible and overcomplex models. The Bayesian approach helps detect poor underlying assumptions in learning models. For learning models well matched to a problem, a good correlation between generalization ability and the Bayesian evidence is obtained.
A computational scheme for reasoning in dynamic probabilistic networks
A computational scheme for reasoning about dynamic systems using (causal) probabilistic networks is presented. The scheme is based on the framework of Lauritzen and Spiegel-halter (1988), and may be viewed as a generalization of the inference methods of classical time-series analysis in the sense that it allows description of non-linear, multivariate dynamic systems with complex conditional independence structures. Further, the scheme provides a method for efficient backward smoothing and possibilities for efficient, approximate forecasting methods. The scheme has been implemented on top of the HUGIN shell.
Understanding evidential reasoning
Ruspini, E. H. | Lowrance, J. D. | Strat, T. M.
We address recent criticisms of evidential reasoning, an approach to the analysis of imprecise and uncertain information that is based on the Dempster-Shafer calculus of evidence. We show that evidential reasoning can be interpreted in terms of classical probability theory and that the Dempster-Shafer calculus of evidence may be considered to be a form of generalized probabilistic reasoning based on the representation of probabilistic ignorance by intervals of possible values. In particular, we emphasize that it is not necessary to resort to nonprobabilistic or subjectivist explanations to justify the validity of the approach. We answer conceptual criticisms of evidential reasoning primarily on the basis of the criticism's confusion between the current state of development of the theory โ mainly theoretical limitations in the treatment of conditional information โ and its potential usefulness in treating a wide variety of uncertainty analysis problems. Similarly, we indicate that the supposed lack of decision-support schemes of generalized probability approaches is not a theoretical handicap but rather an indication of basic informational shortcomings that is a desirable asset of any formal approximate reasoning approach.
A Bayesian method for the induction of probabilistic networks from data
This paper presents a Bayesian method for constructing probabilistic networks from databases. In particular, we focus on constructing Bayesian belief networks. Potential applications include computer-assisted hypothesis testing, automated scientific discovery, and automated construction of probabilistic expert systems. We extend the basic method to handle missing data and hidden (latent) variables. We show how to perform probabilistic inference by averaging over the inferences of multiple belief networks.
Asymptotic slowing down of the nearest-neighbor classifier
Snapp, Robert R., Psaltis, Demetri, Venkatesh, Santosh S.
M2/n' for sufficiently large values of M. Here, Poo(error) denotes the probability of error in the infinite sample limit, and is at most twice the error of a Bayes classifier. Although the value of the coefficient a depends upon the underlying probability distributions, the exponent of M is largely distribution free. We thus obtain a concise relation between a classifier's ability to generalize from a finite reference sample and the dimensionality of the feature space, as well as an analytic validation of Bellman's well known "curse of dimensionality." 1 INTRODUCTION One of the primary tasks assigned to neural networks is pattern classification. Common applications include recognition problems dealing with speech, handwritten characters, DNA sequences, military targets, and (in this conference) sexual identity. Two fundamental concepts associated with pattern classification are generalization (how well does a classifier respond to input data it has never encountered before?) and scalability (how are a classifier's processing and training requirements affected by increasing the number of features that describe the input patterns?).
On Stochastic Complexity and Admissible Models for Neural Network Classifiers
For a detailed rationale the reader is referred to the work of Rissanen (1984) or Wallace and Freeman (1987) and the references therein. Note that the Minimum Description Length (MDL) technique (as Rissanen's approach has become known) is implicitly related to Maximum A Posteriori (MAP) Bayesian estimation techniques if cast in the appropriate framework.
Basis-Function Trees as a Generalization of Local Variable Selection Methods for Function Approximation
Function approximation on high-dimensional spaces is often thwarted by a lack of sufficient data to adequately "fill" the space, or lack of sufficient computational resources. The technique of local variable selection provides a partial solution to these problems by attempting to approximate functions locally using fewer than the complete set of input dimensions.