Goto

Collaborating Authors

 Country


Statistical Dynamics of Batch Learning

Neural Information Processing Systems

An important issue in neural computing concerns the description of learning dynamics with macroscopic dynamical variables. Recent progresson online learning only addresses the often unrealistic case of an infinite training set. We introduce a new framework to model batch learning of restricted sets of examples, widely applicable toany learning cost function, and fully taking into account the temporal correlations introduced by the recycling of the examples. For illustration we analyze the effects of weight decay and early stopping during the learning of teacher-generated examples.


Bayesian Averaging is Well-Temperated

Neural Information Processing Systems

Often a learning problem has natural quantitative measure of generalization. If a loss function is defined the natural measure is the generalization error, i.e., the expected loss on a random sample independent of the training set. Generalizability is a key topic of learning theory and much progress has been reported. Analytic results for a broad class of machines can be found in the litterature [8, 12, 9, 10] describing the asymptotic generalization ability of supervised algorithms that are continuously parameterized. Asymptotic bounds on generalization for general machines havebeen advocated by Vapnik [11]. Generalization results valid for finite training sets can only be obtained for specific learning machines, see e.g.


Potential Boosters?

Neural Information Processing Systems

Simply changing the potential function allows one to create new algorithms related toAdaBoost. However, these new algorithms are generally not known to have the formal boosting property. This paper examines thequestion of which potential functions lead to new algorithms thatare boosters. The two main results are general sets of conditions on the potential; one set implies that the resulting algorithm is a booster, while the other implies that the algorithm is not. These conditions are applied to previously studied potential functions, such as those used by LogitBoost and Doom II. 1 Introduction The first boosting algorithm appeared in Rob Schapire's thesis [1].


Efficient Approaches to Gaussian Process Classification

Neural Information Processing Systems

The first two methods are related to mean field ideas known in Statistical Physics. The third approach is based on Bayesian online approach which was motivated by recent results in the Statistical Mechanics of Neural Networks. We present simulation results showing: 1. that the mean field Bayesian evidence may be used for hyperparameter tuning and 2. that the online approach may achieve a low training error fast. 1 Introduction Gaussian processes provide promising nonparametric Bayesian approaches to regression andclassification [2, 1].


Model Selection for Support Vector Machines

Neural Information Processing Systems

New functionals for parameter (model) selection of Support Vector Machines areintroduced based on the concepts of the span of support vectors and rescaling of the feature space. It is shown that using these functionals, onecan both predict the best choice of parameters of the model and the relative quality of performance for any value of parameter.


Uniqueness of the SVM Solution

Neural Information Processing Systems

We give necessary and sufficient conditions for uniqueness of the support vector solution for the problems of pattern recognition and regression estimation, for a general class of cost functions. We show that if the solution is not unique, all support vectors are necessarily at bound, and we give some simple examples of non-unique solutions. Wenote that uniqueness of the primal (dual) solution does not necessarily imply uniqueness of the dual (primal) solution. We show how to compute the threshold b when the solution is unique, but when all support vectors are at bound, in which case the usual method for determining b does not work. 1 Introduction Support vector machines (SVMs) have attracted wide interest as a means to implement structuralrisk minimization for the problems of classification and regression estimation. The fact that training an SVM amounts to solving a convex quadratic programming problem means that the solution found is global, and that if it is not unique, then the set of global solutions is itself convex; furthermore, if the objective functionis strictly convex, the solution is guaranteed to be unique [1]1.


Model Selection in Clustering by Uniform Convergence Bounds

Neural Information Processing Systems

Unsupervised learning algorithms are designed to extract structure fromdata samples. Reliable and robust inference requires a guarantee that extracted structures are typical for the data source, Le., similar structures have to be inferred from a second sample set of the same data source. The overfitting phenomenon in maximum entropybased annealing algorithms is exemplarily studied for a class of histogram clustering models. Bernstein's inequality for large deviations is used to determine the maximally achievable approximation quality parameterized by a minimal temperature. Monte Carlo simulations support the proposed model selection criterion byfinite temperature annealing.


Population Decoding Based on an Unfaithful Model

Neural Information Processing Systems

We study a population decoding paradigm in which the maximum likelihood inferenceis based on an unfaithful decoding model (UMLI). This is usually the case for neural population decoding because the encoding process of the brain is not exactly known, or because a simplified decoding modelis preferred for saving computational cost. We consider an unfaithful decoding model which neglects the pairwise correlation between neuronal activities, and prove that UMLI is asymptotically efficient whenthe neuronal correlation is uniform or of limited-range. The performance of UMLI is compared with that of the maximum likelihood inference based on a faithful model and that of the center of mass decoding method.It turns out that UMLI has advantages of decreasing the computational complexity remarkablely and maintaining a high-level decoding accuracy at the same time. The effect of correlation on the decoding accuracy is also discussed.


Memory Capacity of Linear vs. Nonlinear Models of Dendritic Integration

Neural Information Processing Systems

Previous biophysical modeling work showed that nonlinear interactions amongnearby synapses located on active dendritic trees can provide a large boost in the memory capacity of a cell (Mel, 1992a, 1992b). The aim of our present work is to quantify this boost by estimating the capacity of (1) a neuron model with passive dendritic integrationwhere inputs are combined linearly across the entire cell followed by a single global threshold, and (2) an active dendrite model in which a threshold is applied separately to the output of each branch, and the branch subtotals are combined linearly. Wefocus here on the limiting case of binary-valued synaptic weights, and derive expressions which measure model capacity by estimating the number of distinct input-output functions available to both neuron types. We show that (1) the application of a fixed nonlinearity to each dendritic compartment substantially increases the model's flexibility, (2) for a neuron of realistic size, the capacity of the nonlinear cell can exceed that of the same-sized linear cell by more than an order of magnitude, and (3) the largest capacity boost occurs for cells with a relatively large number of dendritic subunits of relatively small size. We validated the analysis by empirically measuring memory capacity with randomized two-class classification problems,where a stochastic delta rule was used to train both linear and nonlinear models. We found that large capacity boosts predicted for the nonlinear dendritic model were readily achieved in practice.


LTD Facilitates Learning in a Noisy Environment

Neural Information Processing Systems

This increase in synaptic strength must be countered by a mechanism for weakening the synapse [4]. The biological correlate, long-term depression (LTD) has also been observed in the laboratory; that is, synapses are observed to weaken when low presynaptic activity coincides with high postsynaptic activity [5]-[6].