Bayesian Learning
Asymptotic slowing down of the nearest-neighbor classifier
Snapp, Robert R., Psaltis, Demetri, Venkatesh, Santosh S.
Santosh S. Venkatesh Electrical Engineering University of Pennsylvania Philadelphia, PA 19104 If patterns are drawn from an n-dimensional feature space according to a probability distribution that obeys a weak smoothness criterion, we show that the probability that a random input pattern is misclassified by a nearest-neighbor classifier using M random reference patterns asymptotically satisfies a PM(error) "" Poo(error) M2/n' for sufficiently large values of M. Here, Poo(error) denotes the probability of error in the infinite sample limit, and is at most twice the error of a Bayes classifier. Although the value of the coefficient a depends upon the underlying probability distributions, the exponent of M is largely distribution free.We thus obtain a concise relation between a classifier's ability to generalize from a finite reference sample and the dimensionality of the feature space, as well as an analytic validation of Bellman's well known "curse of dimensionality." 1 INTRODUCTION One of the primary tasks assigned to neural networks is pattern classification.
Convergence of a Neural Network Classifier
Baras, John S., LaVigna, Anthony
In this paper, we prove that the vectors in the LVQ learning algorithm converge. We do this by showing that the learning algorithm performs stochastic approximation. Convergence is then obtained by identifying the appropriate conditions on the learning rate and on the underlying statistics of the classification problem. We also present a modification to the learning algorithm which we argue results in convergence of the LVQ error to the Bayesian optimal error as the appropriate parameters become large.
On Stochastic Complexity and Admissible Models for Neural Network Classifiers
Padhraic Smyth Communications Systems Research Jet Propulsion Laboratory California Institute of Technology Pasadena, CA 91109 Abstract Given some training data how should we choose a particular network classifier froma family of networks of different complexities? In this paper we discuss how the application of stochastic complexity theory to classifier design problems can provide some insights into this problem. In particular we introduce the notion of admissible models whereby the complexity of models under consideration is affected by (among other factors) the class entropy, the amount of training data, and our prior belief. In particular we discuss the implications of these results with respect to neural architectures anddemonstrate the approach on real data from a medical diagnosis task. 1 Introduction and Motivation In this paper we examine in a general sense the application of Minimum Description Length (MDL) techniques to the problem of selecting a good classifier from a large set of candidate models or hypotheses. Pattern recognition algorithms differ from more conventional statistical modeling techniques in the sense that they typically choose from a very large number of candidate models to describe the available data.
Principles of Diagnosis: Current Trends and a Report on the First International Workshop
Automated diagnosis is an important AI problem not only for its potential practical applications but also because it exposes issues common to all automated reasoning efforts and presents real challenges to existing paradigms. Current research in this area addresses many problems, including managing and structuring probabilistic information, modeling physical systems, reasoning with defeasible assumptions, and interleaving deliberation and action. Furthermore, diagnosis programs must face these problems in contexts where scaling up to deal with cases of realistic size results in daunting combinatorics. This article presents these and other issues as discussed at the First International Workshop on Principles of Diagnosis.
Decision Analysis and Expert Systems
Henrion, Max, Breese, John S., Horvitz, Eric J.
Decision analysis and expert systems are technologies intended to support human reasoning and decision making by formalizing expert knowledge so that it is amenable to mechanized reasoning methods. Despite some common goals, these two paradigms have evolved divergently, with fundamental differences in principle and practice. Recent recognition of the deficiencies of traditional AI techniques for treating uncertainty, coupled with the development of belief nets and influence diagrams, is stimulating renewed enthusiasm among AI researchers in probabilistic reasoning and decision analysis. We present the key ideas of decision analysis and review recent research and applications that aim toward a marriage of these two paradigms. This work combines decision-analytic methods for structuring and encoding uncertain knowledge and preferences with computational techniques from AI for knowledge representation, inference, and explanation. We end by outlining remaining research issues to fully develop the potential of this enterprise.
Bayesian Networks without Tears.
I give an introduction to Bayesian networks for AI researchers with a limited grounding in probability theory. Over the last few years, this method of reasoning using probabilities has become popular within the AI probability and uncertainty community. Indeed, it is probably fair to say that Bayesian networks are to a large segment of the AI-uncertainty community what resolution theorem proving is to the AIlogic community. Nevertheless, despite what seems to be their obvious importance, the ideas and techniques have not spread much beyond the research community responsible for them. This is probably because the ideas and techniques are not that easy to understand. I hope to rectify this situation by making Bayesian networks more accessible to the probabilistically unsophisticated.
Maximum Likelihood Competitive Learning
One popular class of unsupervised algorithms are competitive algorithms. In the traditional view of competition, only one competitor, the winner, adapts for any given case. I propose to view competitive adaptation as attempting to fit a blend of simple probability generators (such as gaussians) to a set of data-points. The maximum likelihood fit of a model of this type suggests a "softer" form of competition, in which all competitors adapt in proportion to the relative probability that the input came from each competitor. I investigate one application of the soft competitive model, placement of radial basis function centers for function interpolation, and show that the soft model can give better performance with little additional computational cost. 1 INTRODUCTION Interest in unsupervised learning has increased recently due to the application of more sophisticated mathematical tools (Linsker, 1988; Plumbley and Fallside, 1988; Sanger, 1989) and the success of several elegant simulations of large scale selforganization (Linsker, 1986; Kohonen, 1982). One popular class of unsupervised algorithms are competitive algorithms, which have appeared as components in a variety of systems (Von der Malsburg, 1973; Fukushima, 1975; Grossberg, 1978). Generalizing the definition of Rumelhart and Zipser (1986), a competitive adaptive system consists of a collection of modules which are structurally identical except, possibly, for random initial parameter variation.
Bayesian Inference of Regular Grammar and Markov Source Models
Smith, Kurt R., Miller, Michael I.
In this paper we develop a Bayes criterion which includes the Rissanen complexity, for inferring regular grammar models. We develop two methods for regular grammar Bayesian inference. The fIrst method is based on treating the regular grammar as a I-dimensional Markov source, and the second is based on the combinatoric characteristics of the regular grammar itself. We apply the resulting Bayes criteria to a particular example in order to show the efficiency of each method.