Europe
Adaptive Nonlinear System Identification with Echo State Networks
Echo state networks (ESN) are a novel approach to recurrent neural networktraining. An ESN consists of a large, fixed, recurrent "reservoir" network, from which the desired output is obtained by training suitable output connection weights. Determination of optimal outputweights becomes a linear, uniquely solvable task of MSE minimization. This article reviews the basic ideas and describes anonline adaptation scheme based on the RLS algorithm known from adaptive linear systems. As an example, a 10th order NARMAsystem is adaptively identified.
Adaptive Scaling for Feature Selection in SVMs
Grandvalet, Yves, Canu, Stéphane
This paper introduces an algorithm for the automatic relevance determination ofinput variables in kernelized Support Vector Machines. Relevance is measured by scale factors defining the input space metric, and feature selection is performed by assigning zero weights to irrelevant variables. The metric is automatically tuned by the minimization of the standard SVM empirical risk, where scale factors are added to the usual set of parameters defining the classifier. Feature selection is achieved by constraints encouraging the sparsity of scale factors. The resulting algorithm compares favorably to state-of-the-art feature selection procedures anddemonstrates its effectiveness on a demanding facial expression recognition problem.
Bayesian Monte Carlo
Ghahramani, Zoubin, Rasmussen, Carl E.
We investigate Bayesian alternatives to classical Monte Carlo methods for evaluating integrals. Bayesian Monte Carlo (BMC) allows the incorporation ofprior knowledge, such as smoothness of the integrand, into the estimation. In a simple problem we show that this outperforms any classical importance sampling method. We also attempt more challenging multidimensionalintegrals involved in computing marginal likelihoods ofstatistical models (a.k.a.
Fractional Belief Propagation
We consider loopy belief propagation for approximate inference in probabilistic graphicalmodels. A limitation of the standard algorithm is that clique marginals are computed as if there were no loops in the graph. To overcome this limitation, we introduce fractional belief propagation. Fractional belief propagation is formulated in terms of a family of approximate freeenergies, which includes the Bethe free energy and the naive mean-field free as special cases. Using the linear response correction ofthe clique marginals, the scale parameters can be tuned. Simulation results illustrate the potential merits of the approach.
On the Complexity of Learning the Kernel Matrix
Bousquet, Olivier, Herrmann, Daniel
We investigate data based procedures for selecting the kernel when learning withSupport Vector Machines. We provide generalization error bounds by estimating the Rademacher complexities of the corresponding function classes. In particular we obtain a complexity bound for function classes induced by kernels with given eigenvectors, i.e., we allow to vary the spectrum and keep the eigenvectors fix. This bound is only a logarithmic factorbigger than the complexity of the function class induced by a single kernel. However, optimizing the margin over such classes leads to overfitting. We thus propose a suitable way of constraining the class. We use an efficient algorithm to solve the resulting optimization problem, present preliminary experimental results, and compare them to an alignment-based approach.
Dyadic Classification Trees via Structural Risk Minimization
Classification trees are one of the most popular types of classifiers, with ease of implementation and interpretation being among their attractive features. Despite the widespread use of classification trees, theoretical analysis of their performance is scarce. In this paper, we show that a new family of classification trees, called dyadic classification trees (DCTs), are near optimal (in a minimax sense) for a very broad range of classification problems.This demonstrates that other schemes (e.g., neural networks, support vector machines) cannot perform significantly better than DCTs in many cases. We also show that this near optimal performance isattained with linear (in the number of training data) complexity growing and pruning algorithms. Moreover, the performance of DCTs on benchmark datasets compares favorably to that of standard CART, which is generally more computationally intensive and which does not possess similar near optimality properties. Our analysis stems from theoretical resultson structural risk minimization, on which the pruning rule for DCTs is based.
Stable Fixed Points of Loopy Belief Propagation Are Local Minima of the Bethe Free Energy
We extend recent work on the connection between loopy belief propagation and the Bethe free energy. Constrained minimization of the Bethe free energy can be turned into an unconstrained saddle-point problem. Both converging double-loop algorithms and standard loopy belief propagation can be interpreted asattempts to solve this saddle-point problem. Stability analysis then leads us to conclude that stable fixed points of loopy belief propagation must be (local) minima of the Bethe free energy. Perhaps surprisingly, the converse need not be the case: minima can be unstable fixed points. We illustrate this with an example and discuss implications.
A Statistical Mechanics Approach to Approximate Analytical Bootstrap Averages
Malzahn, Dörthe, Opper, Manfred
We apply the replica method of Statistical Physics combined with a variational methodto the approximate analytical computation of bootstrap averages for estimating the generalization error. We demonstrate our approach onregression with Gaussian processes and compare our results with averages obtained by Monte-Carlo sampling.