Europe
An Entropic Estimator for Structure Discovery
We introduce a novel framework for simultaneous structure and parameter learning in hidden-variable conditional probability models, based on an entropic prior and a solution for its maximum a posteriori (MAP) estimator. The MAP estimate minimizes uncertainty in all respects: cross-entropy between model and data; entropy of the model; entropy of the data's descriptive statistics. Iterative estimation extinguishes weakly supported parameters, compressing and sparsifying the model. Trimming operators accelerate this process by removing excess parameters and, unlike most pruning schemes, guarantee an increase in posterior probability. Entropic estimation takes a overcomplete random model and simplifies it, inducing the structure of relations between hidden and observed variables. Applied to hidden Markov models (HMMs), it finds a concise finite-state machine representing the hidden structure of a signal. We entropically model music, handwriting, and video time-series, and show that the resulting models are highly concise, structured, predictive, and interpretable: Surviving states tend to be highly correlated with meaningful partitions of the data, while surviving transitions provide a low-perplexity model of the signal dynamics.
Semiparametric Support Vector and Linear Programming Machines
Smola, Alex J., Frieß, Thilo-Thomas, Schölkopf, Bernhard
In fact, for many of the kernels used (not the polynomial kernels) like Gaussian rbf-kernels it can be shown [6] that SV machines are universal approximators. While this is advantageous in general, parametric models are useful techniques in their own right. Especially if one happens to have additional knowledge about the problem, it would be unwise not to take advantage of it. For instance it might be the case that the major properties of the data are described by a combination of a small set of linear independent basis functions {¢Jt (.), ..., ¢n (.)}. Or one may want to correct the data for some (e.g.
A Polygonal Line Algorithm for Constructing Principal Curves
Kégl, Balázs, Krzyzak, Adam, Linder, Tamás, Zeger, Kenneth
Principal curves have been defined as "self consistent" smooth curves which pass through the "middle" of a d-dimensional probability distribution ordata cloud. Recently, we [1] have offered a new approach by defining principal curves as continuous curves of a given length which minimize the expected squared distance between the curve and points of the space randomly chosen according to a given distribution. The new definition made it possible to carry out a theoretical analysis of learning principal curves from training data. In this paper we propose a practical construction based on the new definition. Simulation results demonstrate that the new algorithm compares favorably with previous methods both in terms of performance and computational complexity.
Dynamically Adapting Kernels in Support Vector Machines
Cristianini, Nello, Campbell, Colin, Shawe-Taylor, John
The kernel-parameter is one of the few tunable parameters in Support Vectormachines, controlling the complexity of the resulting hypothesis. Its choice amounts to model selection and its value is usually found by means of a validation set. We present an algorithm whichcan automatically perform model selection with little additional computational cost and with no need of a validation set. In this procedure model selection and learning are not separate, but kernels are dynamically adjusted during the learning process to find the kernel parameter which provides the best possible upper bound on the generalisation error. Theoretical results motivating the approach and experimental results confirming its validity are presented.
Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks
Bartlett, Peter L., Maiorov, Vitaly, Meir, Ron
VitalyMaiorov Department of Mathematics Technion, Haifa 32000 Israel Ron Meir Department of Electrical Engineering Technion, Haifa 32000 Israel rmeir@dumbo.technion.ac.il Abstract We compute upper and lower bounds on the VC dimension of feedforward networks of units with piecewise polynomial activation functions.We show that if the number of layers is fixed, then the VC dimension grows as W log W, where W is the number of parameters in the network. The VC dimension is an important measure of the complexity of a class of binaryvalued functions,since it characterizes the amount of data required for learning in the PAC setting (see [BEHW89, Vap82]). In this paper, we establish upper and lower bounds on the VC dimension of a specific class of multi-layered feedforward neural networks. Let F be the class of binary-valued functions computed by a feedforward neural network with W weights and k computational (non-input) units, each with a piecewise polynomial activation function. O(W2), which would lead one to conclude that the bounds Almost Linear VC Dimension Bounds for Piecewise Polynomial Networks 191 are in fact tight up to a constant.
Tractable Variational Structures for Approximating Graphical Models
Barber, David, Wiegerinck, Wim
Graphical models provide a broad probabilistic framework with applications inspeech recognition (Hidden Markov Models), medical diagnosis (Belief networks) and artificial intelligence (Boltzmann Machines). However, the computing time is typically exponential in the number of nodes in the graph. Within the variational framework forapproximating these models, we present two classes of distributions, decimatableBoltzmann Machines and Tractable Belief Networks that go beyond the standard factorized approach. We give generalised mean-field equations for both these directed and undirected approximations. Simulation results on a small benchmark problemsuggest using these richer approximations compares favorably against others previously reported in the literature. 1 Introduction Graphical models provide a powerful framework for probabilistic inference[l] but suffer intractability when applied to large scale problems.
Where Does the Population Vector of Motor Cortical Cells Point during Reaching Movements?
Baraduc, Pierre, Guigon, Emmanuel, Burnod, Yves
Visually-guided arm reaching movements are produced by distributed neural networks within parietal and frontal regions of the cerebral cortex. Experimental data indicate that (I) single neurons in these regions are broadly tuned to parameters of movement; (2) appropriate commands are elaborated by populations of neurons; (3) the coordinated action of neurons canbe visualized using a neuronal population vector (NPV). However, theNPV provides only a rough estimate of movement parameters (direction, velocity) and may even fail to reflect the parameters of movement whenarm posture is changed. We designed a model of the cortical motor command to investigate the relation between the desired direction of the movement, the actual direction of movement and the direction of the NPV in motor cortex. The model is a two-layer self-organizing neural network which combines broadly-tuned (muscular) proprioceptive and (cartesian) visual information to calculate (angular) motor commands for the initial part of the movement of a two-link arm. The network was trained by motor babbling in 5 positions. Simulations showed that (1) the network produced appropriate movement direction over a large part of the workspace; (2) small deviations of the actual trajectory from the desired trajectory existed at the extremities of the workspace; (3) these deviations were accompanied by large deviations of the NPV from both trajectories. These results suggest the NPV does not give a faithful image of cortical processing during arm reaching movements.
SMEM Algorithm for Mixture Models
Ueda, Naonori, Nakano, Ryohei, Ghahramani, Zoubin, Hinton, Geoffrey E.
We present a split and merge EM (SMEM) algorithm to overcome the local maximum problem in parameter estimation of finite mixture models. In the case of mixture models, non-global maxima often involve having too many components of a mixture model in one part of the space and too few in another, widelyseparated part of the space. To escape from such configurations we repeatedly perform simultaneous split and merge operations using a new criterion for efficiently selecting the split and merge candidates. We apply the proposed algorithm to the training of Gaussian mixtures and mixtures of factor analyzers using synthetic and real data and show the effectiveness of using the split and merge operations to improve the likelihood of both the training data and of held-out test data. 1 INTRODUCTION Mixture density models, in particular normal mixtures, have been extensively used in the field of statistical pattern recognition [1]. Recently, more sophisticated mixture densitymodels such as mixtures of latent variable models (e.g., probabilistic PCA or factor analysis) have been proposed to approximate the underlying data manifold [2]-[4].
Regularizing AdaBoost
Rätsch, Gunnar, Onoda, Takashi, Müller, Klaus R.
We will also introduce a regularization strategy(analogous to weight decay) into boosting. This strategy uses slack variables to achieve a soft margin (section 4). Numerical experiments show the validity of our regularization approach in section 5 and finally a brief conclusion is given. 2 AdaBoost Algorithm Let {ht(x): t 1, ...,T} be an ensemble of T hypotheses defined on input vector x and e