Goto

Collaborating Authors

 Country


On the Optimality of Incremental Neural Network Algorithms

Neural Information Processing Systems

We study the approximation of functions by two-layer feedforward neural networks,focusing on incremental algorithms which greedily add units, estimating single unit parameters at each stage. As opposed to standard algorithms for fixed architectures, the optimization at each stage is performed over a small number of parameters, mitigating many of the difficult numerical problems inherent in high-dimensional nonlinear optimization. Weestablish upper bounds on the error incurred by the algorithm, when approximating functions from the Sobolev class, thereby extending previous results which only provided rates of convergence for functions in certain convex hulls of functional spaces. By comparing our results to recently derived lower bounds, we show that the greedy algorithms arenearly optimal. Combined with estimation error results for greedy algorithms, a strong case can be made for this type of approach.


Bayesian Modeling of Facial Similarity

Neural Information Processing Systems

In previous work [6, 9, 10], we advanced a new technique for direct visual matching of images for the purposes of face recognition and image retrieval, using a probabilistic measure of similarity based primarily on a Bayesian (MAP) analysis of image differences, leadingto a "dual" basis similar to eigenfaces [13]. The performance advantage of this probabilistic matching technique over standard Euclidean nearest-neighbor eigenface matching was recently demonstrated using results from DARPA's 1996 "FERET" face recognition competition, in which this probabilistic matching algorithm was found to be the top performer. We have further developed a simple method of replacing the costly compution of nonlinear (online) Bayesian similarity measures by the relatively inexpensive computation of linear (offline) subspace projections and simple (online) Euclidean norms, thus resulting in a significant computational speedup for implementation with very large image databases as typically encountered in real-world applications.


Kernel PCA and De-Noising in Feature Spaces

Neural Information Processing Systems

Kernel PCA as a nonlinear feature extractor has proven powerful as a preprocessing step for classification algorithms. But it can also be considered asa natural generalization of linear principal component analysis. This gives rise to the question how to use nonlinear features for data compression, reconstruction, and de-noising, applications common in linear PCA. This is a nontrivial task, as the results provided by kernel PCAlive in some high dimensional feature space and need not have pre-images in input space. This work presents ideas for finding approximate pre-images,focusing on Gaussian kernels, and shows experimental results using these pre-images in data reconstruction and de-noising on toy examples as well as on real world data.


Viewing Classifier Systems as Model Free Learning in POMDPs

Neural Information Processing Systems

Classifier systems are now viewed disappointing because of their problems suchas the rule strength vs rule set performance problem and the credit assignment problem. In order to solve the problems, we have developed ahybrid classifier system: GLS (Generalization Learning System). In designing GLS, we view CSs as model free learning in POMDPs and take a hybrid approach to finding the best generalization, given the total number of rules. GLS uses the policy improvement procedure by Jaakkola et al. for an locally optimal stochastic policy when a set of rule conditions is given. GLS uses GA to search for the best set of rule conditions. 1 INTRODUCTION Classifier systems (CSs) (Holland 1986) have been among the most used in reinforcement learning.


A Model for Associative Multiplication

Neural Information Processing Systems

Despite the fact that mental arithmetic is based on only a few hundred basicfacts and some simple algorithms, humans have a difficult time mastering the subject, and even experienced individuals make mistakes. Associative multiplication, the process of doing multiplication by memory without the use of rules or algorithms, is especially problematic.


Fisher Scoring and a Mixture of Modes Approach for Approximate Inference and Learning in Nonlinear State Space Models

Neural Information Processing Systems

The difficulties lie in the Monte-Carlo E-step which consists of sampling from the posterior distribution of the hidden variables given the observations. The new idea presented in this paper is to generate samples from a Gaussian approximation to the true posterior from which it is easy to obtain independent samples. The parameters of the Gaussian approximation are either derived from the extended Kalman filter or the Fisher scoring algorithm. In case the posterior density is multimodal wepropose to approximate the posterior by a sum of Gaussians (mixture of modes approach). We show that sampling from the approximate posteriordensities obtained by the above algorithms leads to better models than using point estimates for the hidden states. In our experiment, theFisher scoring algorithm obtained a better approximation of the posterior mode than the EKF. For a multimodal distribution, the mixture ofmodes approach gave superior results. 1 INTRODUCTION Nonlinear state space models (NSSM) are a general framework for representing nonlinear time series. In particular, any NARMAX model (nonlinear auto-regressive moving average model with external inputs) can be translated into an equivalent NSSM.


Learning a Continuous Hidden Variable Model for Binary Data

Neural Information Processing Systems

A directed generative model for binary data using a small number of hidden continuous units is investigated. The relationships between the correlations of the underlying continuousGaussian variables and the binary output variables are utilized to learn the appropriate weights of the network. The advantages of this approach are illustrated on a translationally invariant binarydistribution and on handwritten digit images. Introduction Principal Components Analysis (PCA) is a widely used statistical technique for representing datawith a large number of variables [1]. It is based upon the assumption that although the data is embedded in a high dimensional vector space, most of the variability in the data is captured by a much lower climensional manifold.


Robust, Efficient, Globally-Optimized Reinforcement Learning with the Parti-Game Algorithm

Neural Information Processing Systems

The former represents the number of cells that have to be traveled through to get to the goal cell and the latter represents the belief that there is no reliable way of getting from that cell to the goal. Cells with a cost of infinity are called losing cells while others are called winning ones.


Source Separation as a By-Product of Regularization

Neural Information Processing Systems

This paper reveals a previously ignored connection between two important fields: regularization and independent component analysis (ICA).We show that at least one representative of a broad class of algorithms (regularizers that reduce network complexity) extracts independent features as a byproduct. This algorithm is Flat Minimum Search (FMS), a recent general method for finding low-complexity networks with high generalization capability. FMS works by minimizing both training error and required weight precision. Accordingto our theoretical analysis the hidden layer of an FMStrained autoassociator attempts at coding each input by a sparse code with as few simple features as possible. In experiments themethod extracts optimal codes for difficult versions of the "noisy bars" benchmark problem by separating the underlying sources, whereas ICA and PCA fail.


Regularizing AdaBoost

Neural Information Processing Systems

We will also introduce a regularization strategy(analogous to weight decay) into boosting. This strategy uses slack variables to achieve a soft margin (section 4). Numerical experiments show the validity of our regularization approach in section 5 and finally a brief conclusion is given. 2 AdaBoost Algorithm Let {ht(x): t 1, ...,T} be an ensemble of T hypotheses defined on input vector x and e