Learning Graphical Models
Adaptive Mixture of Probabilistic Transducers
We introduce and analyze a mixture model for supervised learning of probabilistic transducers. We devise an online learning algorithm that efficiently infers the structure and estimates the parameters of each model in the mixture. Theoretical analysis and comparative simulations indicate that the learning algorithm tracks the best model from an arbitrarily large (possibly infinite) pool of models. We also present an application of the model for inducing a noun phrase recognizer.
Bayesian Methods for Mixtures of Experts
Waterhouse, Steve R., MacKay, David, Robinson, Anthony J.
Tel: [ 44] 1223 332815 ajr@eng.cam.ac.uk ABSTRACT We present a Bayesian framework for inferring the parameters of a mixture of experts model based on ensemble learning by variational freeenergy minimisation. The Bayesian approach avoids the over-fitting and noise level underestimation problems of traditional maximum likelihood inference. We demonstrate these methods on artificial problems and sunspot time series prediction. INTRODUCTION The task of estimating the parameters of adaptive models such as artificial neural networks using Maximum Likelihood (ML) is well documented ego Geman, Bienenstock & Doursat (1992). ML estimates typically lead to models with high variance, a process known as "over-fitting".
The National Science Foundation Workshop on Reinforcement Learning
Mahadevan, Sridhar, Kaelbling, Leslie Pack
Reinforcement learning has become one of the most actively studied learning frameworks in the area of intelligent autonomous agents. This article describes the results of a three-day meeting of leading researchers in this area that was sponsored by the National Science Foundation. Because reinforcement learning is an interdisciplinary topic, the workshop brought together researchers from a variety of fields, including machine learning, neural networks, AI, robotics, and operations research. Thirty leading researchers from the United States, Canada, Europe, and Japan, representing from many different universities, government, and industrial research laboratories participated in the workshop. The goals of the meeting were to (1) understand limitations of current reinforcement-learning systems and define promising directions for further research; (2) clarify the relationships between reinforcement learning and existing work in engineering fields, such as operations research; and (3) identify potential industrial applications of reinforcement learning.
Exploiting Causal Independence in Bayesian Network Inference
A new method is proposed for exploiting causal independencies in exact Bayesian network inference. A Bayesian network can be viewed as representing a factorization of a joint probability into the multiplication of a set of conditional probabilities. We present a notion of causal independence that enables one to further factorize the conditional probabilities into a combination of even smaller factors and consequently obtain a finer-grain factorization of the joint probability. The new formulation of causal independence lets us specify the conditional probability of a variable given its parents in terms of an associative and commutative operator, such as ``or'', ``sum'' or ``max'', on the contribution of each parent. We start with a simple algorithm VE for Bayesian network inference that, given evidence and a query variable, uses the factorization to find the posterior distribution of the query. We show how this algorithm can be extended to exploit causal independence. Empirical studies, based on the CPCS networks for medical diagnosis, show that this method is more efficient than previous methods and allows for inference in larger networks than previous algorithms.
From Data Mining to Knowledge Discovery in Databases
Fayyad, Usama, Piatetsky-Shapiro, Gregory, Smyth, Padhraic
Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges involved in real-world applications of knowledge discovery, and current and future research directions in the field.
Mean Field Theory for Sigmoid Belief Networks
Saul, L. K., Jaakkola, T., Jordan, M. I.
We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. Our mean field theory provides a tractable approximation to the true probability distribution in these networks; it also yields a lower bound on the likelihood of evidence. We demonstrate the utility of this framework on a benchmark problem in statistical pattern recognition---the classification of handwritten digits.
Logarithmic-Time Updates and Queries in Probabilistic Networks
Delcher, A. L., Grove, A. J., Kasif, S., Pearl, J.
Traditional databases commonly support efficient query and update procedures that operate in time which is sublinear in the size of the database. Our goal in this paper is to take a first step toward dynamic reasoning in probabilistic databases with comparable efficiency. We propose a dynamic data structure that supports efficient algorithms for updating and querying singly connected Bayesian networks. In the conventional algorithm, new evidence is absorbed in O(1) time and queries are processed in time O(N), where N is the size of the network. We propose an algorithm which, after a preprocessing phase, allows us to answer queries in time O(log N) at the expense of O(log N) time per evidence absorption. The usefulness of sub-linear processing time manifests itself in applications requiring (near) real-time response over large probabilistic databases. We briefly discuss a potential application of dynamic probabilistic reasoning in computational biology.
Bayesian Query Construction for Neural Network Models
Paass, Gerhard, Kindermann, Jörg
If data collection is costly, there is much to be gained by actively selecting particularly informative data points in a sequential way. In a Bayesian decision-theoretic framework we develop a query selection criterion which explicitly takes into account the intended use of the model predictions. By Markov Chain Monte Carlo methods the necessary quantities can be approximated to a desired precision. As the number of data points grows, the model complexity is modified by a Bayesian model selection strategy. The properties of two versions of the criterion ate demonstrated in numerical experiments.
Classifying with Gaussian Mixtures and Clusters
Kambhatla, Nanda, Leen, Todd K.
In this paper, we derive classifiers which are winner-take-all (WTA) approximations to a Bayes classifier with Gaussian mixtures for class conditional densities. The derived classifiers include clustering based algorithms like LVQ and k-Means. We propose a constrained rank Gaussian mixtures model and derive a WTA algorithm for it. Our experiments with two speech classification tasks indicate that the constrained rank model and the WTA approximations improve the performance over the unconstrained models. 1 Introduction A classifier assigns vectors from Rn (n dimensional feature space) to one of K classes, partitioning the feature space into a set of K disjoint regions. A Bayesian classifier builds the partition based on a model of the class conditional probability densities of the inputs (the partition is optimal for the given model).
Pairwise Neural Network Classifiers with Probabilistic Outputs
Price, David, Knerr, Stefan, Personnaz, Léon, Dreyfus, Gérard
Multi-class classification problems can be efficiently solved by partitioning the original problem into sub-problems involving only two classes: for each pair of classes, a (potentially small) neural network is trained using only the data of these two classes. We show how to combine the outputs of the two-class neural networks in order to obtain posterior probabilities for the class decisions. The resulting probabilistic pairwise classifier is part of a handwriting recognition system which is currently applied to check reading. We present results on real world data bases and show that, from a practical point of view, these results compare favorably to other neural network approaches.