AITopics

The difficulties lie in the Monte-Carlo E-step which consists of sampling from the posterior distribution of the hidden variables given the observations. The new idea presented in this paper is to generate samples from a Gaussian approximation to the true posterior from which it is easy to obtain independent samples. The parameters of the Gaussian approximation are either derived from the extended Kalman filter or the Fisher scoring algorithm. In case the posterior density is multimodal we propose to approximate the posterior by a sum of Gaussians (mixture of modes approach). We show that sampling from the approximate posterior densities obtained by the above algorithms leads to better models than using point estimates for the hidden states. In our experiment, the Fisher scoring algorithm obtained a better approximation of the posterior mode than the EKF. For a multimodal distribution, the mixture of modes approach gave superior results. 1 INTRODUCTION Nonlinear state space models (NSSM) are a general framework for representing nonlinear time series. In particular, any NARMAX model (nonlinear auto-regressive moving average model with external inputs) can be translated into an equivalent NSSM.

approximation, fisher, gaussian, (16 more...)

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.64)

Boyen, Xavier, Koller, Daphne

Approximate Learning of Dynamic Models

Inference is a key component in learning probabilistic models from partially observable data. When learning temporal models, each of the many inference phases requires a traversal over an entire long data sequence; furthermore, the data structures manipulated are exponentially large, making this process computationally expensive. In [2], we describe an approximate inference algorithm for monitoring stochastic processes, and prove bounds on its approximation error. In this paper, we apply this algorithm as an approximate forward propagation step in an EM algorithm for learning temporal Bayesian networks. We provide a related approximation for the backward step, and prove error bounds for the combined algorithm.

algorithm, approximation, stochastic process, (15 more...)

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)

Blake, Andrew, North, Ben, Isard, Michael

Learning Multi-Class Dynamics

Yule-Walker) are available for learning Auto-Regressive process models of simple, directly observable, dynamical processes. When sensor noise means that dynamics are observed only approximately, learning can still been achieved via Expectation-Maximisation (EM) together with Kalman Filtering. However, this does not handle more complex dynamics, involving multiple classes of motion.

algorithm, dynamical model, dynamical parameter, (16 more...)

Country:

North America > United States > Mississippi > Lafayette County > Oxford (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Bayesian PCA

Bishop, Christopher M.

The technique of principal component analysis (PCA) has recently been expressed as the maximum likelihood solution for a generative latent variable model. In this paper we use this probabilistic reformulation as the basis for a Bayesian treatment of PCA. Our key result is that effective dimensionality of the latent space (equivalent to the number of retained principal components) can be determined automatically as part of the Bayesian inference procedure. An important application of this framework is to mixtures of probabilistic PCA models, in which each component can determine its own effective complexity.

dimensionality, effective dimensionality, pca, (14 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Birattari, Mauro, Bontempi, Gianluca, Bersini, Hugues

Lazy Learning Meets the Recursive Least Squares Algorithm

Lazy learning is a memory-based technique that, once a query is received, extracts a prediction interpolating locally the neighboring examples of the query which are considered relevant according to a distance measure. In this paper we propose a data-driven method to select on a query-by-query basis the optimal number of neighbors to be considered for each prediction. As an efficient way to identify and validate local models, the recursive least squares algorithm is introduced in the context of local approximation and lazy learning. Furthermore, beside the winner-takes-all strategy for model selection, a local combination of the most promising models is explored. The method proposed is tested on six different datasets and compared with a state-of-the-art approach.

algorithm, prediction, selection, (12 more...)

Country:

Europe > Belgium (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Bennett, Kristin P., Demiriz, Ayhan

Semi-Supervised Support Vector Machines

We introduce a semi-supervised support vector machine (S3yM) method. Given a training set of labeled data and a working set of unlabeled data, S3YM constructs a support vector machine using both the training and working sets. We use S3 YM to solve the transduction problem using overall risk minimization (ORM) posed by Yapnik. The transduction problem is to estimate the value of a classification function at the given points in the working set. This contrasts with the standard inductive learning problem of estimating the classification function at all possible values and then using the fixed function to deduce the classes of the working set data.

generalization, support vector machine, vector machine, (14 more...)

Country:

North America > United States > California > Orange County > Irvine (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.05)
North America > United States > New York > Rensselaer County > Troy (0.05)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.69)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Learning a Hierarchical Belief Network of Independent Factor Analyzers

Attias, Hagai

The model parameters are learned in an unsupervised manner by maximizing the likelihood that these data are generated by the model. A multilayer belief network is a realization of such a model. Many belief networks have been proposed that are composed of binary units. The hidden units in such networks represent latent variables that explain different features of the data, and whose relation to the ·Current address: Gatsby Computational Neuroscience Unit, University College London, 17 Queen Square, London WC1N 3AR, U.K. 362 H. Attias data is highly nonlinear. However, for tasks such as object and speech recognition which produce real-valued data, the models provided by binary networks are often inadequate.

algorithm, approximation, latent variable, (16 more...)

Country:

Europe > United Kingdom (0.24)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > California > San Mateo County > San Mateo (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

A Theory of Mean Field Approximation

Tanaka, Toshiyuki

I present a theory of mean field approximation based on information geometry. This theory includes in a consistent way the naive mean field approximation, as well as the TAP approach and the linear response theorem in statistical physics, giving clear information-theoretic interpretations to them. 1 INTRODUCTION Many problems of neural networks, such as learning and pattern recognition, can be cast into a framework of statistical estimation problem. How difficult it is to solve a particular problem depends on a statistical model one employs in solving the problem. For Boltzmann machines[ 1] for example, it is computationally very hard to evaluate expectations of state variables from the model parameters. Mean field approximation[2], which is originated in statistical physics, has been frequently used in practical situations in order to circumvent this difficulty.

approximation, field approximation, mean field approximation, (13 more...)

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.05)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.54)

Learning Curves for Gaussian Processes

Sollich, Peter

I consider the problem of calculating learning curves (i.e., average generalization performance) of Gaussian processes used for regression. A simple expression for the generalization error in terms of the eigenvalue decomposition of the covariance function is derived, and used as the starting point for several approximation schemes. I identify where these become exact, and compare with existing bounds on learning curves; the new approximations, which can be used for any input space dimension, generally get substantially closer to the truth. 1 INTRODUCTION: GAUSSIAN PROCESSES Within the neural networks community, there has in the last few years been a good deal of excitement about the use of Gaussian processes as an alternative to feedforward networks [lJ. The advantages of Gaussian processes are that prior assumptions about the problem to be learned are encoded in a very transparent way, and that inference-at least in the case of regression that I will consider-is relatively straightforward. One crucial question for applications is then how'fast' Gaussian processes learn, i.e., how many training examples are needed to achieve a certain level of generalization performance.

approximation, covariance function, gaussian process, (14 more...)

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.56)

Skantzos, N. S., Beckmann, C. F., Coolen, Anthony C. C.

Discontinuous Recall Transitions Induced by Competition Between Short- and Long-Range Interactions in Recurrent Networks

We present exact analytical equilibrium solutions for a class of recurrent neural network models, with both sequential and parallel neuronal dynamics, in which there is a tunable competition between nearestneighbour and long-range synaptic interactions. This competition is found to induce novel coexistence phenomena as well as discontinuous transitions between pattern recall states, 2-cycles and non-recall states.

synapse, transition, transition line, (11 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)