Goto

Collaborating Authors

 Country


Some Theoretical Results Concerning the Convergence of Compositions of Regularized Linear Functions

Neural Information Processing Systems

Recently, sample complexity bounds have been derived for problems involving linearfunctions such as neural networks and support vector machines. In this paper, we extend some theoretical results in this area by deriving dimensional independent covering number bounds for regularized linearfunctions under certain regularization conditions. We show that such bounds lead to a class of new methods for training linear classifiers withsimilar theoretical advantages of the support vector machine. Furthermore, we also present a theoretical analysis for these new methods fromthe asymptotic statistical point of view. This technique provides better description for large sample behaviors of these algorithms. 1 Introduction In this paper, we are interested in the generalization performance of linear classifiers obtained fromcertain algorithms.


Boosting Algorithms as Gradient Descent

Neural Information Processing Systems

Recent theoretical results suggest that the effectiveness of these algorithms is due to their tendency to produce large margin classifiers [1, 18]. Loosely speaking, if a combination of classifiers correctly classifies most of the training data with a large margin, then its error probability is small. In [14] we gave improved upper bounds on the misclassification probability of a combined classifier in terms of the average over the training data of a certain cost function of the margins. That paper also described DOOM, an algorithm for directly minimizingthe margin cost function by adjusting the weights associated with Boosting Algorithms as Gradient Descent 513 each base classifier (the base classifiers are suppiled to DOOM). DOOM exhibits performance improvements over AdaBoost, even when using the same base hypotheses, whichprovides additional empirical evidence that these margin cost functions are appropriate quantities to optimize. In this paper, we present a general class of algorithms (called AnyBoost) which are gradient descent algorithms for choosing linear combinations of elements of an inner product function space so as to minimize some cost functional. The normal operation of a weak learner is shown to be equivalent to maximizing a certain inner product. We prove convergence of AnyBoost under weak conditions. In Section 3, we show that this general class of algorithms includes as special cases nearly all existing voting methods.


Learning from User Feedback in Image Retrieval Systems

Neural Information Processing Systems

We formulate the problem of retrieving images from visual databases as a problem of Bayesian inference. This leads to natural and effective solutions for two of the most challenging issues in the design of a retrieval system: providing support for region-based queries without requiring prior image segmentation, and accounting for user-feedback during a retrieval session. We present a new learning algorithm that relies on belief propagation to account for both positive and negative examples of the user's interests.


Neural Network Based Model Predictive Control

Neural Information Processing Systems

Model Predictive Control was developed in the late 70's and came into widespread use, particularly in the refining industry, in the 80's. The economic benefit of this approach to control has been documented [1,2] .


Memory Capacity of Linear vs. Nonlinear Models of Dendritic Integration

Neural Information Processing Systems

Previous biophysical modeling work showed that nonlinear interactions amongnearby synapses located on active dendritic trees can provide a large boost in the memory capacity of a cell (Mel, 1992a, 1992b).


Efficient Approaches to Gaussian Process Classification

Neural Information Processing Systems

The first two methods are related to mean field ideas known in Statistical Physics. The third approach is based on Bayesian online approach which was motivated by recent results in the Statistical Mechanics of Neural Networks. We present simulation results showing: 1. that the mean field Bayesian evidence may be used for hyperparameter tuning and 2. that the online approach may achieve a low training error fast. 1 Introduction Gaussian processes provide promising nonparametric Bayesian approaches to regression andclassification [2, 1].


Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology

Neural Information Processing Systems

Local "belief propagation" rules of the sort proposed by Pearl [15] are guaranteed to converge to the correct posterior probabilities in singly connected graphical models. Recently, a number of researchers have empirically demonstratedgood performance of "loopy belief propagation" using these same rules on graphs with loops. Perhaps the most dramatic instance is the near Shannon-limit performance of "Turbo codes", whose decoding algorithm is equivalent to loopy belief propagation. Except for the case of graphs with a single loop, there has been little theoretical understandingof the performance of loopy propagation. Here we analyze belief propagation in networks with arbitrary topologies when the nodes in the graph describe jointly Gaussian random variables.


The Nonnegative Boltzmann Machine

Neural Information Processing Systems

The nonnegative Boltzmann machine (NNBM) is a recurrent neural network modelthat can describe multimodal nonnegative data. Application ofmaximum likelihood estimation to this model gives a learning rule that is analogous to the binary Boltzmann machine. We examine the utility of the mean field approximation for the NNBM, and describe how Monte Carlo sampling techniques can be used to learn its parameters. Reflective slicesampling is particularly well-suited for this distribution, and can efficiently be implemented to sample the distribution. We illustrate learning of the NNBM on a transiationally invariant distribution, as well as on a generative model for images of human faces. Introduction The multivariate Gaussian is the most elementary distribution used to model generic data.


v-Arc: Ensemble Learning in the Presence of Outliers

Neural Information Processing Systems

The idea of a large minimum margin [17] explains the good generalization performance of AdaBoost in the low noise regime. However, AdaBoost performs worse on noisy tasks [10, 11], such as the iris and the breast cancer benchmark data sets [1]. On the latter tasks, a large margin on all training points cannot be achieved without adverse effects on the generalization error. This experimental observation was supported by the study of [13] where the generalization error of ensemble methods was bounded by the sum of the fraction of training points which have a margin smaller than some value p, say, plus a complexity term depending on the base hypotheses and p. While this bound can only capture part of what is going on in practice, it nevertheless already conveys the message that in some cases it pays to allow for some points which have a small margin, or are misclassified, if this leads to a larger overall margin on the remaining points. To cope with this problem, it was mandatory to construct regularized variants of AdaBoost, which traded off the number of margin errors and the size of the margin 562 G. Riitsch, B. Sch6lkopf, A. J. Smola, K.-R.


Distributed Synchrony of Spiking Neurons in a Hebbian Cell Assembly

Neural Information Processing Systems

We investigate the behavior of a Hebbian cell assembly of spiking neurons formed via a temporal synaptic learning curve. This learning function is based on recent experimental findings. It includes potentiation for short time delays between pre-and post-synaptic neuronal spiking, and depression for spiking events occuring in the reverse order. The coupling between the dynamics of the synaptic learning and of the neuronal activation leads to interesting results. We find that the cell assembly can fire asynchronously, but may also function in complete synchrony, or in distributed synchrony.