AITopics

We introduce a novel framework for simultaneous structure and parameter learning in hidden-variable conditional probability models, based on an en tropic prior and a solution for its maximum a posteriori (MAP) estimator. The MAP estimate minimizes uncertainty in all respects: cross-entropy between model and data; entropy of the model; entropy of the data's descriptive statistics. Iterative estimation extinguishes weakly supported parameters, compressing and sparsifying the model. Trimming operators accelerate this process by removing excess parameters and, unlike most pruning schemes, guarantee an increase in posterior probability. Entropic estimation takes a overcomplete random model and simplifies it, inducing the structure of relations between hidden and observed variables. Applied to hidden Markov models (HMMs), it finds a concise finite-state machine representing the hidden structure of a signal. We entropically model music, handwriting, and video time-series, and show that the resulting models are highly concise, structured, predictive, and interpretable: Surviving states tend to be highly correlated with meaningful partitions of the data, while surviving transitions provide a low-perplexity model of the signal dynamics.

entropy, estimator, probability, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Bristol (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.90)

Probabilistic Visualisation of High-Dimensional Binary Data

Tipping, Michael E.

We present a probabilistic latent-variable framework for data visualisation, a key feature of which is its applicability to binary and categorical data types for which few established methods exist. A variational approximation to the likelihood is exploited to derive a fast algorithm for determining the model parameters. Illustrations of application to real and synthetic binary data sets are given.

approximation, variational approximation, visualisation, (15 more...)

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Hyvärinen, Aapo, Hoyer, Patrik O., Oja, Erkki

Sparse Code Shrinkage: Denoising by Nonlinear Maximum Likelihood Estimation

One of the simplest methods is to use linear transformations of the observed data.

noise, shrinkage function, sparse, (11 more...)

Country: Europe > Finland > Uusimaa > Helsinki (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.90)

Hofmann, Thomas, Puzicha, Jan, Jordan, Michael I.

Learning from Dyadic Data

Dyadzc data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This type of data arises naturally in many application ranging from computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domain-independent framework of learning from dyadic data by statistical mixture models. Our approach covers different models with fiat and hierarchical latent class structures. We propose an annealed version of the standard EM algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains. 1 Introduction Over the past decade learning from data has become a highly active field of research distributed over many disciplines like pattern recognition, neural computation, statistics, machine learning, and data mining.

aspect model, hofmann, learning, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Middle East > Jordan (0.06)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Ghahramani, Zoubin, Roweis, Sam T.

Learning Nonlinear Dynamical Systems Using an EM Algorithm

The Expectation-Maximization (EM) algorithm is an iterative procedure for maximum likelihood parameter estimation from data sets with missing or hidden variables [2]. It has been applied to system identification in linear stochastic state-space models, where the state variables are hidden from the observer and both the state and the parameters of the model have to be estimated simultaneously [9]. We present a generalization of the EM algorithm for parameter estimation in nonlinear dynamical systems. The "expectation" step makes use of Extended Kalman Smoothing to estimate the state, while the "maximization" step re-estimates the parameters using these uncertain state estimates. In general, the nonlinear maximization step is difficult because it requires integrating out the uncertainty in the states.

algorithm, dynamical system, nonlinearity, (11 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Ontario (0.04)
Europe > United Kingdom (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Boyen, Xavier, Koller, Daphne

Approximate Learning of Dynamic Models

Inference is a key component in learning probabilistic models from partially observable data. When learning temporal models, each of the many inference phases requires a traversal over an entire long data sequence; furthermore, the data structures manipulated are exponentially large, making this process computationally expensive. In [2], we describe an approximate inference algorithm for monitoring stochastic processes, and prove bounds on its approximation error. In this paper, we apply this algorithm as an approximate forward propagation step in an EM algorithm for learning temporal Bayesian networks. We provide a related approximation for the backward step, and prove error bounds for the combined algorithm.

algorithm, approximation, stochastic process, (15 more...)

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)

Bayesian PCA

Bishop, Christopher M.

The technique of principal component analysis (PCA) has recently been expressed as the maximum likelihood solution for a generative latent variable model. In this paper we use this probabilistic reformulation as the basis for a Bayesian treatment of PCA. Our key result is that effective dimensionality of the latent space (equivalent to the number of retained principal components) can be determined automatically as part of the Bayesian inference procedure. An important application of this framework is to mixtures of probabilistic PCA models, in which each component can determine its own effective complexity.

dimensionality, effective dimensionality, pca, (14 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Learning a Hierarchical Belief Network of Independent Factor Analyzers

Attias, Hagai

The model parameters are learned in an unsupervised manner by maximizing the likelihood that these data are generated by the model. A multilayer belief network is a realization of such a model. Many belief networks have been proposed that are composed of binary units. The hidden units in such networks represent latent variables that explain different features of the data, and whose relation to the ·Current address: Gatsby Computational Neuroscience Unit, University College London, 17 Queen Square, London WC1N 3AR, U.K. 362 H. Attias data is highly nonlinear. However, for tasks such as object and speech recognition which produce real-valued data, the models provided by binary networks are often inadequate.

algorithm, approximation, latent variable, (16 more...)

Country:

Europe > United Kingdom (0.24)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

Opper, Manfred, Winther, Ole

Mean Field Methods for Classification with Gaussian Processes

We discuss the application of TAP mean field methods known from the Statistical Mechanics of disordered systems to Bayesian classification models with Gaussian processes. In contrast to previous approaches, no knowledge about the distribution of inputs is needed. Simulation results for the Sonar data set are given.

classification, gaussian process, mean field method, (13 more...)

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.05)
Europe > Denmark > Capital Region > Copenhagen (0.05)
(2 more...)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Kearns, Michael J., Saul, Lawrence K.

Inference in Multilayer Networks via Large Deviation Bounds

Arguably one of the most important types of information processing is the capacity for probabilistic reasoning. The properties of undirectedproDabilistic models represented as symmetric networks have been studied extensively using methods from statistical mechanics (Hertz et aI, 1991). Detailed analyses of these models are possible by exploiting averaging phenomena that occur in the thermodynamic limit of large networks. In this paper, we analyze the limit of large, multilayer networks for probabilistic models represented as directed acyclic graphs. These models are known as Bayesian networks (Pearl, 1988; Neal, 1992), and they have different probabilistic semantics than symmetric neural networks (such as Hopfield models or Boltzmann machines). We show that the intractability of exact inference in multilayer Bayesian networks Inference in Multilayer Networks via Large Deviation Bounds 261 does not preclude their effective use. Our work builds on earlier studies of variational methods (Jordan et aI, 1997).

marginal probability, node, probability, (14 more...)

Country:

Asia > Middle East > Jordan (0.25)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > California > San Mateo County > Redwood City (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)