Goto

Collaborating Authors

 Directed Networks


On the Use of Evidence in Neural Networks

Neural Information Processing Systems

The Bayesian "evidence" approximation has recently been employed to determine the noise and weight-penalty terms used in back-propagation. This paper shows that for neural nets it is far easier to use the exact result than it is to use the evidence approximation. Moreover, unlike the evidence approximation,the exact result neither has to be re-calculated for every new data set, nor requires the running of computer code (the exact result is closed form). In addition, it turns out that the evidence procedure's MAPestimate for neural nets is, in toto, approximation error. Another advantage of the exact analysis is that it does not lead one to incorrect intuition, like the claim that using evidence one can "evaluate different priors in light of the data". This paper also discusses sufficiency conditions for the evidence approximation to hold, why it can sometimes give "reasonable" results, etc.


Probabilistic Horn abduction and Bayesian networks

Classics

This paper presents a simple framework for Horn-clause abduction, with probabilities associated with hypotheses. The framework incorporates assumptions about the rule base and independence assumptions amongst hypotheses. It is shown how any probabilistic knowledge representable in a discrete Bayesian belief network can be represented in this framework. The main contribution is in finding a relationship between logical and probabilistic notions of evidential reasoning. This provides a useful representation language in its own right, providing a compromise between heuristic and epistemic adequacy. It also shows how Bayesian networks can be extended beyond a propositional language.


Approximating probabilistic inference in Bayesian belief networks is NP-hard

Classics

It is known that exact computation of conditional probabilities in belief networks is NP-hard. Many investigators in the AI community have tacitly assumed that algorithms for performing approximate inference with belief networks are of polynomial complexity. Indeed, special cases of approximate inference can be performed in time polynomial in the input size. However, we have discovered that the general problem of approximating conditional probabilities with belief networks, like exact inference, resides in the NP-hard complexity class. We develop a complexity analysis to elucidate the difficulty of approximate probabilistic inference.



Best-First Model Merging for Dynamic Learning and Recognition

Neural Information Processing Systems

"Best-first model merging" is a general technique for dynamically choosing the structure of a neural or related architecture while avoiding overfitting. It is applicable to both leaming and recognition tasks and often generalizes significantly better than fixed structures. We demonstrate the approach applied to the tasks of choosing radial basis functions for function learning, choosing local affine models for curve and constraint surface modelling, and choosing the structure of a balltree or bumptree to maximize efficiency of access.


Best-First Model Merging for Dynamic Learning and Recognition

Neural Information Processing Systems

"Best-first model merging" is a general technique for dynamically choosing the structure of a neural or related architecture while avoiding overfitting. It is applicable to both leaming and recognition tasks and often generalizes significantly better than fixed structures. We demonstrate the approach applied to the tasks of choosing radial basis functions for function learning, choosing local affine models for curve and constraint surface modelling, and choosing the structure of a balltree or bumptree to maximize efficiency of access.



Bayesian Model Comparison and Backprop Nets

Neural Information Processing Systems

The Bayesian model comparison framework is reviewed, and the Bayesian Occam's razor is explained. This framework can be applied to feedforward networks, making possible (1) objective comparisons between solutions using alternative network architectures; (2) objective choice of magnitude and type of weight decay terms; (3) quantified estimates of the error bars on network parameters and on network output. The framework also generates a measure of the effective number of parameters determined by the data. The relationship of Bayesian model comparison to recent work on prediction of generalisation ability (Guyon et al., 1992, Moody, 1992) is discussed.


Neural Control for Rolling Mills: Incorporating Domain Theories to Overcome Data Deficiency

Neural Information Processing Systems

In a Bayesian framework, we give a principled account of how domainspecific prior knowledge such as imperfect analytic domain theories can be optimally incorporated into networks of locally-tuned units: by choosing a specific architecture and by applying a specific training regimen. Our method proved successful in overcoming the data deficiency problem in a large-scale application to devise a neural control for a hot line rolling mill. It achieves in this application significantly higher accuracy than optimally-tuned standard algorithms such as sigmoidal backpropagation, and outperforms the state-of-the-art solution.