Neural Information Processing Systems
On a Modification to the Mean Field EM Algorithm in Factorial Learning
Dunmur, A. P., Titterington, D. M.
A modification is described to the use of mean field approximations inthe E step of EM algorithms for analysing data from latent structure models, as described by Ghahramani (1995), among others. Themodification involves second-order Taylor approximations to expectations computed in the E step. The potential benefits of the method are illustrated using very simple latent profile models. 1 Introduction Ghahramani (1995) advocated the use of mean field methods as a means to avoid the heavy computation involved in the E step of the EM algorithm used for estimating parameters within a certain latent structure model, and Ghahramani & Jordan (1995) used the same ideas in a more complex situation. Dunmur & Titterington (1996a) identified Ghahramani's model as a so-called latent profile model, they observed that Zhang (1992,1993) had used mean field methods for a similar purpose, and they showed, in a simulation study based on very simple examples, that the mean field version of the EM algorithm often performed very respectably. By this it is meant that, when data were generated from the model under analysis, the estimators of the underlying parameters were efficient, judging by empirical results, especially in comparison with estimators obtained by employing the'correct' EM algorithm: the examples therefore had to be simple enough that the correct EM algorithm is numerically feasible, although any success reported for the mean field 432 A. P. Dunmur and D. M. Titterington version is, one hopes, an indication that the method will also be adequate in more complex situations in which the correct EM algorithm is not implementable because of computational complexity. In spite of the above positive remarks, there were circumstances in which there was a perceptible, if not dramatic, lack of efficiency in the simple (naive) mean field estimators, and the objective of this contribution is to propose and investigate ways of refining the method so as to improve performance without detracting from the appealing, and frequently essential, simplicity of the approach. The procedure used here is based on a second order correction to the naive mean field well known in statistical physics and sometimes called the cavity or TAP method (Mezard, Parisi & Virasoro, 1987). It has been applied recently in cluster analysis (Hofmann & Buhmann, 1996). In Section 2 we introduce the structure of our model, Section 3 explains the refined mean field approach, Section 4 provides numerical results, and Section 5 contains a statement of our conclusions.
Multi-Grid Methods for Reinforcement Learning in Controlled Diffusion Processes
The optimal control problem reduces to a boundary value problem for a fully nonlinear second-order elliptic differential equation of Hamilton Jacobi-Bellman (HJB-) type. Numerical analysis provides multigrid methodsfor this kind of equation. In the case of Learning Control, however,the systems of equations on the various grid-levels are obtained using observed information (transitions and local cost). To ensure consistency, special attention needs to be directed toward thetype of time and space discretization during the observation. Analgorithm for multi-grid observation is proposed. The multi-grid algorithm is demonstrated on a simple queuing problem. 1 Introduction Controlled Diffusion Processes (CDP) are the analogy to Markov Decision Problems in continuous state space and continuous time.
Combinations of Weak Classifiers
To obtain classification systems with both good generalization performance andefficiency in space and time, we propose a learning method based on combinations of weak classifiers, where weak classifiers arelinear classifiers (perceptrons) which can do a little better than making random guesses. A randomized algorithm is proposed to find the weak classifiers. Theyยท are then combined through a majority vote.As demonstrated through systematic experiments, the method developed is able to obtain combinations of weak classifiers with good generalization performance and a fast training time on a variety of test problems and real applications.
Improving the Accuracy and Speed of Support Vector Machines
Burges, Christopher J. C., Schรถlkopf, Bernhard
Support Vector Learning Machines (SVM) are finding application in pattern recognition, regression estimation, and operator inversion forill-posed problems. Against this very general backdrop, any methods for improving the generalization performance, or for improving the speed in test phase, of SVMs are of increasing interest. Inthis paper we combine two such techniques on a pattern recognition problem. The method for improving generalization performance (the"virtual support vector" method) does so by incorporating known invariances of the problem. This method achieves a drop in the error rate on 10,000 NIST test digit images of 1.4% to 1.0%.
Spatiotemporal Coupling and Scaling of Natural Images and Human Visual Sensitivities
We study the spatiotemporal correlation in natural time-varying images and explore the hypothesis that the visual system is concerned withthe optimal coding of visual representation through spatiotemporal decorrelation of the input signal. Based on the measured spatiotemporal power spectrum, the transform needed to decorrelate input signal is derived analytically and then compared with the actual processing observed in psychophysical experiments.
Dynamics of Training
Bรถs, Siegfried, Opper, Manfred
A new method to calculate the full training process of a neural network isintroduced. No sophisticated methods like the replica trick are used. The results are directly related to the actual number of training steps. Some results are presented here, like the maximal learning rate, an exact description of early stopping, and the necessary numberof training steps. Further problems can be addressed with this approach.
3D Object Recognition: A Model of View-Tuned Neurons
Bricolo, Emanuela, Poggio, Tomaso, Logothetis, Nikos K.
Recognition of specific objects, such as recognition of a particular face, can be based on representations that are object centered, such as 3D structural models. Alternatively, a 3D object may be represented for the purpose of recognition in terms of a set of views. This latter class of models is biologically attractive because model acquisition - the learning phase - is simpler and more natural. A simple model for this strategy of object recognition was proposed by Poggio and Edelman (Poggio and Edelman, 1990). They showed that, with few views of an object usedas training examples, a classification network, such as a Gaussian radial basis function network, can learn to recognize novel views of that object, in partic- 42 E.Bricolo, T. Poggio and N. Logothetis (a) (b) View angle Figure 1: (a) Schematic representation of the architecture of the Poggio-Edelman model. The shaded circles correspond to the view-tuned units, each tuned to a view of the object, while the open circle correspond to the view-invariant, object specific output unit.
Interpolating Earth-science Data using RBF Networks and Mixtures of Experts
We present a mixture of experts (ME) approach to interpolate sparse, spatially correlated earth-science data. Kriging is an interpolation method which uses a global covariation model estimated from the data to take account of the spatial dependence in the data. Based on the close relationship between kriging and the radial basis function (RBF) network (Wan & Bone, 1996), we use a mixture of generalized RBF networks to partition the input space into statistically correlated regions and learn the local covariation model of the data in each region. Applying the ME approach to simulated and real-world data, we show that it is able to achieve good partitioning of the input space, learn the local covariation models and improve generalization.
Learning Exact Patterns of Quasi-synchronization among Spiking Neurons from Data on Multi-unit Recordings
Martignon, Laura, Laskey, Kathryn B., Deco, Gustavo, Vaadia, Eilon
This paper develops arguments for a family of temporal log-linear models to represent spatiotemporal correlations among the spiking events in a group of neurons. The models can represent not just pairwise correlations but also correlations of higher order. Methods are discussed for inferring the existence or absence of correlations and estimating their strength. A frequentist and a Bayesian approach to correlation detection are compared.