Saul, Lawrence K.
A Variational Principle for Model-based Morphing
Saul, Lawrence K., Jordan, Michael I.
Given a multidimensional data set and a model of its density, we consider how to define the optimal interpolation between two points. This is done by assigning a cost to each path through space, based on two competing goals-one to interpolate through regions of high density, the other to minimize arc length. From this path functional, we derive the Euler-Lagrange equations for extremal motionj given two points, the desired interpolation is found by solving a boundary value problem. We show that this interpolation can be done efficiently, in high dimensions, for Gaussian, Dirichlet, and mixture models. 1 Introduction The problem of nonlinear interpolation arises frequently in image, speech, and signal processing. Consider the following two examples: (i) given two profiles of the same face, connect them by a smooth animation of intermediate poses[l]j (ii) given a telephone signal masked by intermittent noise, fill in the missing speech.
Fast Learning by Bounding Likelihoods in Sigmoid Type Belief Networks
Jaakkola, Tommi, Saul, Lawrence K., Jordan, Michael I.
Often the parameters used in these networks needto be learned from examples. Unfortunately, estimating the parameters via exact probabilistic calculations (i.e, the EMalgorithm) is intractable even for networks with fairly small numbers of hidden units. We propose to avoid the infeasibility of the E step by bounding likelihoods instead of computing them exactly. Weintroduce extended and complementary representations for these networks and show that the estimation of the network parameters can be made fast (reduced to quadratic optimization) by performing the estimation in either of the alternative domains. The complementary networks can be used for continuous density estimation as well. 1 Introduction The appeal of probabilistic networks for knowledge representation, inference, and learning (Pearl, 1988) derives both from the sound Bayesian framework and from the explicit representation of dependencies among the network variables which allows readyincorporation of prior information into the design of the network.
Exploiting Tractable Substructures in Intractable Networks
Saul, Lawrence K., Jordan, Michael I.
We develop a refined mean field approximation for inference and learning in probabilistic neural networks. Our mean field theory, unlike most, does not assume that the units behave as independent degrees of freedom; instead, it exploits in a principled way the existence of large substructures that are computationally tractable. To illustrate the advantages of this framework, we show how to incorporate weak higher order interactions into a first-order hidden Markov model, treating the corrections (but not the first order structure) within mean field theory. 1 INTRODUCTION Learning the parameters in a probabilistic neural network may be viewed as a problem in statistical estimation.
Exploiting Tractable Substructures in Intractable Networks
Saul, Lawrence K., Jordan, Michael I.
We develop a refined mean field approximation for inference and learning in probabilistic neural networks. Our mean field theory, unlike most, does not assume that the units behave as independent degrees of freedom; instead, it exploits in a principled way the existence of large substructures that are computationally tractable. To illustrate the advantages of this framework, we show how to incorporate weak higher order interactions into a first-order hidden Markov model, treating the corrections (but not the first order structure) within mean field theory. 1 INTRODUCTION Learning the parameters in a probabilistic neural network may be viewed as a problem in statistical estimation.
Fast Learning by Bounding Likelihoods in Sigmoid Type Belief Networks
Jaakkola, Tommi, Saul, Lawrence K., Jordan, Michael I.
Sigmoid type belief networks, a class of probabilistic neural networks, provide a natural framework for compactly representing probabilistic information in a variety of unsupervised and supervised learning problems. Often the parameters used in these networks need to be learned from examples. Unfortunately, estimating the parameters via exact probabilistic calculations (i.e, the EMalgorithm) is intractable even for networks with fairly small numbers of hidden units. We propose to avoid the infeasibility of the E step by bounding likelihoods instead of computing them exactly. We introduce extended and complementary representations for these networks and show that the estimation of the network parameters can be made fast (reduced to quadratic optimization) by performing the estimation in either of the alternative domains.
Boltzmann Chains and Hidden Markov Models
Saul, Lawrence K., Jordan, Michael I.
Statistical models of discrete time series have a wide range of applications, most notably to problems in speech recognition (Juang & Rabiner, 1991) and molecular biology (Baldi, Chauvin, Hunkapiller, & McClure, 1992). A common problem in these fields is to find a probabilistic model, and a set of model parameters, that 436 Lawrence K. Saul, Michael I. Jordan
Boltzmann Chains and Hidden Markov Models
Saul, Lawrence K., Jordan, Michael I.
Statistical models of discrete time series have a wide range of applications, most notably to problems in speech recognition (Juang & Rabiner, 1991) and molecular biology (Baldi, Chauvin, Hunkapiller, & McClure, 1992). A common problem in these fields is to find a probabilistic model, and a set of model parameters, that 436 LawrenceK. Saul, Michael I. Jordan account for sequences of observed data. Hidden Markov models (HMMs) have been particularly successful at modeling discrete time series. One reason for this is the powerful learning rule (Baum) 1972») a special case of the Expectation-Maximization (EM) procedure for maximum likelihood estimation (Dempster) Laird) & Rubin) 1977).
Boltzmann Chains and Hidden Markov Models
Saul, Lawrence K., Jordan, Michael I.
Statistical models of discrete time series have a wide range of applications, most notably to problems in speech recognition (Juang & Rabiner, 1991) and molecular biology (Baldi, Chauvin, Hunkapiller, & McClure, 1992). A common problem in these fields is to find a probabilistic model, and a set of model parameters, that 436 Lawrence K. Saul, Michael I. Jordan