Goto

Collaborating Authors

 Country


Robust Neural Network Regression for Offline and Online Learning

Neural Information Processing Systems

Although one can derive the Gaussian noise assumption based on a maximum entropy approach, the main reason for this assumption is practicability: underthe Gaussian noise assumption the maximum likelihood parameter estimate can simply be found by minimization of the squared error. Despite its common use it is far from clear that the Gaussian noise assumption is a good choice for many practical problems. Areasonable approach therefore would be a noise distribution which contains the Gaussian as a special case but which has a tunable parameter that allows for more flexible distributions.


Robust Full Bayesian Methods for Neural Networks

Neural Information Processing Systems

In particular, Mackay showed that by approximating the distributions of the weights with Gaussians and adopting smoothing priors, it is possible to obtain estimates of the weights and output variances and to automatically set the regularisation coefficients.Neal (1996) cast the net much further by introducing advanced Bayesian simulation methods, specifically the hybrid Monte Carlo method, into the analysis of neural networks [3]. Bayesian sequential Monte Carlo methods have also been shown to provide good training results, especially in time-varying scenarios [4]. More recently, Rios Insua and Muller (1998) and Holmes and Mallick (1998) have addressed the issue of selecting the number of hidden neurons with growing and pruning algorithms from a Bayesian perspective [5,6]. In particular, they apply the reversible jump Markov Chain Monte Carlo (MCMC) algorithm of Green [7] to feed-forward sigmoidal networks and radial basis function (RBF) networks to obtain joint estimates of the number of neurons and weights. We also apply the reversible jump MCMC simulation algorithm to RBF networks so as to compute the joint posterior distribution of the radial basis parameters and the number of basis functions. However, we advance this area of research in two important directions.Firstly, we propose a full hierarchical prior for RBF networks.


Some Theoretical Results Concerning the Convergence of Compositions of Regularized Linear Functions

Neural Information Processing Systems

Recently, sample complexity bounds have been derived for problems involving linearfunctions such as neural networks and support vector machines. In this paper, we extend some theoretical results in this area by deriving dimensional independent covering number bounds for regularized linearfunctions under certain regularization conditions. We show that such bounds lead to a class of new methods for training linear classifiers withsimilar theoretical advantages of the support vector machine. Furthermore, we also present a theoretical analysis for these new methods fromthe asymptotic statistical point of view. This technique provides better description for large sample behaviors of these algorithms. 1 Introduction In this paper, we are interested in the generalization performance of linear classifiers obtained fromcertain algorithms.


Semiparametric Approach to Multichannel Blind Deconvolution of Nonminimum Phase Systems

Neural Information Processing Systems

In this paper we discuss the semiparametric statistical model for blind deconvolution. First we introduce a Lie Group to the manifold of noncausal FIRfilters. Then blind deconvolution problem is formulated in the framework of a semiparametric model, and a family of estimating functions is derived for blind deconvolution. A natural gradient learning algorithmis developed for training noncausal filters. Stability of the natural gradient algorithm is also analyzed in this framework.


Algebraic Analysis for Non-regular Learning Machines

Neural Information Processing Systems

Hierarchical learning machines are non-regular and non-identifiable statistical models, whose true parameter sets are analytic sets with singularities. Using algebraic analysis, we rigorously prove that the stochastic complexity of a non-identifiable learning machine is asymptotically equal to '1 log n - (ml - 1) log log n


Probabilistic Methods for Support Vector Machines

Neural Information Processing Systems

One of the open questions that remains is how to set the'tunable' parameters of an SVM algorithm: While methods forchoosing the width of the kernel function and the noise parameter C (which controls how closely the training data are fitted) have been proposed [4, 5] (see also, very recently, [6]), the effect of the overall shape of the kernel function remains imperfectly understood [1]. Error bars (class probabilities) for SVM predictions - important for safety-critical applications, for example - are also difficult to obtain. In this paper I suggest that a probabilistic interpretation of SVMs could be used to tackle these problems. It shows that the SVM kernel defines a prior over functions on the input space, avoiding the need to think in terms of high-dimensional feature spaces. It also allows one to define quantities such as the evidence (likelihood) for a set of hyperparameters (C, kernel amplitude Ko etc). I give a simple approximation to the evidence which can then be maximized to set such hyperparameters. The evidence is sensitive to the values of C and Ko individually, in contrast to properties (such as cross-validation error) of the deterministic solution, which only depends on the product CKo. It can thfrefore be used to assign an unambiguous value to C, from which error bars can be derived.


The Entropy Regularization Information Criterion

Neural Information Processing Systems

Effective methods of capacity control via uniform convergence bounds for function expansions have been largely limited to Support Vector machines, wheregood bounds are obtainable by the entropy number approach.


Noisy Neural Networks and Generalizations

Neural Information Processing Systems

In this paper we define a probabilistic computational model which generalizes many noisy neural network models, including the recent work of Maass and Sontag [5]. We identify weak ergodicjty as the mechanism responsible for restriction of the computational power of probabilistic models to definite languages, independent of the characteristics of the noise: whether it is discrete or analog, or if it depends on the input or not, and independent of whether the variables are discrete or continuous. We give examples of weakly ergodic models including noisy computational systems with noise depending on the current state and inputs, aggregate models, and computational systems which update in continuous time. 1 Introduction Noisy neural networks were recently examined, e.g.


Lower Bounds on the Complexity of Approximating Continuous Functions by Sigmoidal Neural Networks

Neural Information Processing Systems

This is one of the theoretical results most frequently cited to justify the use of sigmoidal neural networks in applications. By this statement one refers to the fact that sigmoidal neural networks have been shown to be able to approximate any continuous function arbitrarily well. Numerous results in the literature have established variants of this universal approximation property by considering distinct function classes to be approximated by network architectures using different types of neural activation functions with respect to various approximation criteria, see for instance [1, 2, 3, 5, 6, 11, 12, 14, 15].


Understanding Stepwise Generalization of Support Vector Machines: a Toy Model

Neural Information Processing Systems

In this article we study the effects of introducing structure in the input distribution of the data to be learnt by a simple perceptron. We determine the learning curves within the framework of Statistical Mechanics.Stepwise generalization occurs as a function of the number of examples when the distribution of patterns is highly anisotropic. Although extremely simple, the model seems to capture therelevant features of a class of Support Vector Machines which was recently shown to present this behavior.