AITopics

Andrew H Gee Cambridge University Engineering Department Cambridge CB2 1PZ England ahg@eng.cam.ac.uk Abstract We propose a novel strategy for training neural networks using sequential sampling-importanceresampling algorithms. This global optimisation strategy allows us to learn the probability distribution ofthe network weights in a sequential framework. It is well suited to applications involving online, nonlinear, non-Gaussian or non-stationary signal processing. 1 INTRODUCTION This paper addresses sequential training of neural networks using powerful sampling techniques. Sequential techniques are important in many applications of neural networks involvingreal-time signal processing, where data arrival is inherently sequential. Furthermore, one might wish to adopt a sequential training strategy to deal with non-stationarity in signals, so that information from the recent past is lent more credence than information from the distant past. One way to sequentially estimate neural network models is to use a state space formulation and the extended Kalman filter (Singhal and Wu 1988, de Freitas, Niranjan and Gee 1998).

algorithm, artificial intelligence, machine learning, (13 more...)

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.27)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Briegel, Thomas, Tresp, Volker

Fisher Scoring and a Mixture of Modes Approach for Approximate Inference and Learning in Nonlinear State Space Models

The difficulties lie in the Monte-Carlo E-step which consists of sampling from the posterior distribution of the hidden variables given the observations. The new idea presented in this paper is to generate samples from a Gaussian approximation to the true posterior from which it is easy to obtain independent samples. The parameters of the Gaussian approximation are either derived from the extended Kalman filter or the Fisher scoring algorithm. In case the posterior density is multimodal wepropose to approximate the posterior by a sum of Gaussians (mixture of modes approach). We show that sampling from the approximate posteriordensities obtained by the above algorithms leads to better models than using point estimates for the hidden states. In our experiment, theFisher scoring algorithm obtained a better approximation of the posterior mode than the EKF. For a multimodal distribution, the mixture ofmodes approach gave superior results. 1 INTRODUCTION Nonlinear state space models (NSSM) are a general framework for representing nonlinear time series. In particular, any NARMAX model (nonlinear auto-regressive moving average model with external inputs) can be translated into an equivalent NSSM.

approximation, artificial intelligence, machine learning, (17 more...)

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.64)

Boyen, Xavier, Koller, Daphne

Approximate Learning of Dynamic Models

Inference is a key component in learning probabilistic models from partially observabledata. When learning temporal models, each of the many inference phases requires a traversal over an entire long data sequence; furthermore,the data structures manipulated are exponentially large, making this process computationally expensive. In [2], we describe an approximate inference algorithm for monitoring stochastic processes, and prove bounds on its approximation error. In this paper, we apply this algorithm as an approximate forward propagation step in an EM algorithm for learning temporal Bayesian networks. We provide a related approximation forthe backward step, and prove error bounds for the combined algorithm.

algorithm, artificial intelligence, machine learning, (18 more...)

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)

Blake, Andrew, North, Ben, Isard, Michael

Learning Multi-Class Dynamics

Yule-Walker) are available for learning Auto-Regressive process models of simple, directly observable, dynamical processes.When sensor noise means that dynamics are observed only approximately, learning can still been achieved via Expectation-Maximisation (EM) together with Kalman Filtering. However, this does not handle more complex dynamics, involving multiple classes of motion.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Bayesian PCA

Bishop, Christopher M.

The technique of principal component analysis (PCA) has recently been expressed as the maximum likelihood solution for a generative latent variable model. In this paper we use this probabilistic reformulation as the basis for a Bayesian treatment of PCA. Our key result is that effective dimensionalityof the latent space (equivalent to the number of retained principal components) can be determined automatically as part of the Bayesian inference procedure. An important application of this framework is to mixtures of probabilistic PCA models, in which each component can determine its own effective complexity. 1 Introduction Principal component analysis (PCA) is a widely used technique for data analysis. Recently Tipping and Bishop (1997b) showed that a specific form of generative latent variable model has the property that its maximum likelihood solution extracts the principal subspace of the observed data set.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Country: Europe > United Kingdom > England (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Birattari, Mauro, Bontempi, Gianluca, Bersini, Hugues

Lazy Learning Meets the Recursive Least Squares Algorithm

Lazy learning is a memory-based technique that, once a query is received, extractsa prediction interpolating locally the neighboring examples of the query which are considered relevant according to a distance measure. In this paper we propose a data-driven method to select on a query-by-query basis the optimal number of neighbors to be considered for each prediction. As an efficient way to identify and validate local models, the recursive least squares algorithm is introduced in the context oflocal approximation and lazy learning. Furthermore, beside the winner-takes-all strategy for model selection, a local combination of the most promising models is explored. The method proposed is tested on six different datasets and compared with a state-of-the-art approach.

artificial intelligence, machine learning, selection, (14 more...)

Country:

Europe (0.14)
North America > United States (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Bennett, Kristin P., Demiriz, Ayhan

Semi-Supervised Support Vector Machines

We introduce a semi-supervised support vector machine (S3yM) method. Given a training set of labeled data and a working set of unlabeled data, S3YM constructs a support vector machine using boththe training and working sets. We use S3 YM to solve the transduction problem using overall risk minimization (ORM) posed by Yapnik. The transduction problem is to estimate the value of a classification function at the given points in the working set. This contrasts with the standard inductive learning problem of estimating the classification function at all possible values and then using the fixed function to deduce the classes of the working set data.

artificial intelligence, machine learning, support vector machine, (18 more...)

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report > New Finding (0.95)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

A Theory of Mean Field Approximation

Tanaka, Toshiyuki

I present a theory of mean field approximation based on information geometry. Thistheory includes in a consistent way the naive mean field approximation, as well as the TAP approach and the linear response theorem instatistical physics, giving clear information-theoretic interpretations to them. 1 INTRODUCTION Many problems of neural networks, such as learning and pattern recognition, can be cast into a framework of statistical estimation problem. How difficult it is to solve a particular problem depends on a statistical model one employs in solving the problem. For Boltzmann machines[ 1] for example, it is computationally very hard to evaluate expectations of state variables from the model parameters. Mean field approximation[2], which is originated in statistical physics, has been frequently used in practical situations in order to circumvent this difficulty.

approximation, machine learning, pattern recognition, (16 more...)

Country: Asia > Japan (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.54)

Skantzos, N. S., Beckmann, C. F., Coolen, Anthony C. C.

Discontinuous Recall Transitions Induced by Competition Between Short- and Long-Range Interactions in Recurrent Networks

We present exact analytical equilibrium solutions for a class of recurrent neuralnetwork models, with both sequential and parallel neuronal dynamics, in which there is a tunable competition between nearestneighbour andlong-range synaptic interactions. This competition is found to induce novel coexistence phenomena as well as discontinuous transitions between pattern recall states, 2-cycles and non-recall states.

artificial intelligence, machine learning, transition, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)