Country
Efficient Approaches to Gaussian Process Classification
Csatรณ, Lehel, Fokouรฉ, Ernest, Opper, Manfred, Schottky, Bernhard, Winther, Ole
The first two methods are related to mean field ideas known in Statistical Physics. The third approach is based on Bayesian online approach which was motivated by recent results in the Statistical Mechanics of Neural Networks. We present simulation results showing: 1. that the mean field Bayesian evidence may be used for hyperparameter tuning and 2. that the online approach may achieve a low training error fast. 1 Introduction Gaussian processes provide promising nonparametric Bayesian approaches to regression andclassification [2, 1].
Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology
Weiss, Yair, Freeman, William T.
Local "belief propagation" rules of the sort proposed by Pearl [15] are guaranteed to converge to the correct posterior probabilities in singly connected graphical models. Recently, a number of researchers have empirically demonstratedgood performance of "loopy belief propagation" using these same rules on graphs with loops. Perhaps the most dramatic instance is the near Shannon-limit performance of "Turbo codes", whose decoding algorithm is equivalent to loopy belief propagation. Except for the case of graphs with a single loop, there has been little theoretical understandingof the performance of loopy propagation. Here we analyze belief propagation in networks with arbitrary topologies when the nodes in the graph describe jointly Gaussian random variables.
The Nonnegative Boltzmann Machine
Downs, Oliver B., MacKay, David J. C., Lee, Daniel D.
The nonnegative Boltzmann machine (NNBM) is a recurrent neural network modelthat can describe multimodal nonnegative data. Application ofmaximum likelihood estimation to this model gives a learning rule that is analogous to the binary Boltzmann machine. We examine the utility of the mean field approximation for the NNBM, and describe how Monte Carlo sampling techniques can be used to learn its parameters. Reflective slicesampling is particularly well-suited for this distribution, and can efficiently be implemented to sample the distribution. We illustrate learning of the NNBM on a transiationally invariant distribution, as well as on a generative model for images of human faces. Introduction The multivariate Gaussian is the most elementary distribution used to model generic data.
v-Arc: Ensemble Learning in the Presence of Outliers
Rรคtsch, Gunnar, Schรถlkopf, Bernhard, Smola, Alex J., Mรผller, Klaus-Robert, Onoda, Takashi, Mika, Sebastian
The idea of a large minimum margin [17] explains the good generalization performance of AdaBoost in the low noise regime. However, AdaBoost performs worse on noisy tasks [10, 11], such as the iris and the breast cancer benchmark data sets [1]. On the latter tasks, a large margin on all training points cannot be achieved without adverse effects on the generalization error. This experimental observation was supported by the study of [13] where the generalization error of ensemble methods was bounded by the sum of the fraction of training points which have a margin smaller than some value p, say, plus a complexity term depending on the base hypotheses and p. While this bound can only capture part of what is going on in practice, it nevertheless already conveys the message that in some cases it pays to allow for some points which have a small margin, or are misclassified, if this leads to a larger overall margin on the remaining points. To cope with this problem, it was mandatory to construct regularized variants of AdaBoost, which traded off the number of margin errors and the size of the margin 562 G. Riitsch, B. Sch6lkopf, A. J. Smola, K.-R.
Distributed Synchrony of Spiking Neurons in a Hebbian Cell Assembly
Horn, David, Levy, Nir, Meilijson, Isaac, Ruppin, Eytan
We investigate the behavior of a Hebbian cell assembly of spiking neurons formed via a temporal synaptic learning curve. This learning function is based on recent experimental findings. It includes potentiation for short time delays between pre-and post-synaptic neuronal spiking, and depression for spiking events occuring in the reverse order. The coupling between the dynamics of the synaptic learning and of the neuronal activation leads to interesting results. We find that the cell assembly can fire asynchronously, but may also function in complete synchrony, or in distributed synchrony.
The Relaxed Online Maximum Margin Algorithm
We describe a new incremental algorithm for training linear threshold functions:the Relaxed Online Maximum Margin Algorithm, or ROMMA. ROMMA can be viewed as an approximation to the algorithm that repeatedly chooses the hyperplane that classifies previously seen examples correctlywith the maximum margin. It is known that such a maximum-margin hypothesis can be computed by minimizing the length of the weight vector subject to a number of linear constraints. ROMMA works by maintaining a relatively simple relaxation of these constraints that can be efficiently updated. We prove a mistake bound for ROMMA that is the same as that proved for the perceptron algorithm. Our analysis implies that the more computationally intensive maximum-margin algorithm alsosatisfies this mistake bound; this is the first worst-case performance guaranteefor this algorithm. We describe some experiments using ROMMA and a variant that updates its hypothesis more aggressively as batch algorithms to recognize handwritten digits. The computational complexity and simplicity of these algorithms is similar to that of perceptron algorithm,but their generalization is much better. We describe a sense in which the performance of ROMMA converges to that of SVM in the limit if bias isn't considered.
Greedy Importance Sampling
I present a simple variation of importance sampling that explicitly searches forimportant regions in the target distribution. I prove that the technique yieldsunbiased estimates, and show empirically it can reduce the variance of standard Monte Carlo estimators. This is achieved by concentrating samplesin more significant regions of the sample space. 1 Introduction It is well known that general inference and learning with graphical models is computationally hard[1] and it is therefore necessary to consider restricted architectures [13], or approximate algorithms to perform these tasks [3, 7]. Among the most convenient and successful techniques are stochastic methods which are guaranteed to converge to a correct solution in the limit oflarge samples [10, 11, 12, 15]. These methods can be easily applied to complex inference problems that overwhelm deterministic approaches.
Neural Computation with Winner-Take-All as the Only Nonlinear Operation
Everybody "knows" that neural networks need more than a single layer ofnonlinear units to compute interesting functions. We show that this is false if one employs winner-take-all as nonlinear unit: - Any boolean function can be computed by a single k-winner-takeall unitapplied to weighted sums of the input variables.
Inference for the Generalization Error
Nadeau, Claude, Bengio, Yoshua
In order to to compare learning algorithms, experimental results reported in the machine learning litterature often use statistical tests of significance. Unfortunately,most of these tests do not take into account the variability due to the choice of training set. We perform a theoretical investigation of the variance of the cross-validation estimate of the generalization errorthat takes into account the variability due to the choice of training sets. This allows us to propose two new ways to estimate this variance. We show, via simulations, that these new statistics perform well relative to the statistics considered by Dietterich (Dietterich, 1998). 1 Introduction When applying a learning algorithm (or comparing several algorithms), one is typically interested in estimating its generalization error. Its point estimation is rather trivial through cross-validation. Providing a variance estimate of that estimation, so that hypothesis testing and/orconfidence intervals are possible, is more difficult, especially, as pointed out in (Hinton et aI., 1995), if one wants to take into account the variability due to the choice of the training sets (Breiman, 1996). A notable effort in that direction is Dietterich's work (Dietterich, 1998).Careful investigation of the variance to be estimated allows us to provide new variance estimates, which tum out to perform well. Let us first layout the framework in which we shall work.