Goto

Collaborating Authors

Factorial Hidden Markov Models

Neural Information Processing Systems

Due to the simplicity and efficiency of its parameter estimation algorithm, the hidden Markov model (HMM) has emerged as one of the basic statistical tools for modeling discrete time series, finding widespread application in the areas of speech recognition (Rabiner and Juang, 1986) and computational molecular biology (Baldi et al., 1994). An HMM is essentially a mixture model, encoding information about the history of a time series in the value of a single multinomial variable (the hidden state). This multinomial assumption allows an efficient parameter estimation algorithm to be derived (the Baum-Welch algorithm). However, it also severely limits the representational capacity of HMMs.


Learning Sparse Perceptrons

Neural Information Processing Systems

We introduce a new algorithm designed to learn sparse perceptrons over input representations which include high-order features. Our algorithm, which is based on a hypothesis-boosting method, is able to PAClearn a relatively natural class of target concepts. Moreover, the algorithm appears to work well in practice: on a set of three problem domains, the algorithm produces classifiers that utilize small numbers of features yet exhibit good generalization performance. Perhaps most importantly, our algorithm generates concept descriptions that are easy for humans to understand. However, in many applications, such as those that may involve scientific discovery, it is crucial to be able to explain predictions.


Human Reading and the Curse of Dimensionality

Neural Information Processing Systems

Whereas optical character recognition (OCR) systems learn to classify single characters; people learn to classify long character strings in parallel, within a single fixation. This difference is surprising because high dimensionality is associated with poor classification learning. This paper suggests that the human reading system avoids these problems because the number of to-be-classified images is reduced by consistent and optimal eye fixation positions, and by character sequence regularities. An interesting difference exists between human reading and optical character recognition (OCR) systems. The input/output dimensionality of character classification in human reading is much greater than that for OCR systems (see Figure 1). OCR systems classify one character at time; while the human reading system classifies as many as 8-13 characters per eye fixation (Rayner, 1979) and within a fixation, character category and sequence information is extracted in parallel (Blanchard, McConkie, Zola, and Wolverton, 1984; Reicher, 1969).



Stable Dynamic Parameter Adaption

Neural Information Processing Systems

A stability criterion for dynamic parameter adaptation is given. In the case of the learning rate of backpropagation, a class of stable algorithms is presented and studied, including a convergence proof.


Memory-based Stochastic Optimization

Neural Information Processing Systems

In this paper we introduce new algorithms for optimizing noisy plants in which each experiment is very expensive. The algorithms build a global nonlinear model of the expected output at the same time as using Bayesian linear regression analysis of locally weighted polynomial models. The local model answers queries about confidence, noise, gradient and Hessians, and use them to make automated decisions similar to those made by a practitioner of Response Surface Methodology. The global and local models are combined naturally as a locally weighted regression. We examine the question of whether the global model can really help optimization, and we extend it to the case of time-varying functions. We compare the new algorithms with a highly tuned higher-order stochastic optimization algorithm on randomly-generated functions and a simulated manufacturing task. We note significant improvements in total regret, time to converge, and final solution quality. 1 INTRODUCTION In a stochastic optimization problem, noisy samples are taken from a plant. A sample consists of a chosen control u (a vector ofreal numbers) and a noisy observed response y.


Learning with ensembles: How overfitting can be useful

Neural Information Processing Systems

We study the characteristics of learning with ensembles. Solving exactly the simple model of an ensemble of linear students, we find surprisingly rich behaviour. For learning in large ensembles, it is advantageous to use under-regularized students, which actually over-fit the training data. Globally optimal performance can be obtained by choosing the training set sizes of the students appropriately. For smaller ensembles, optimization of the ensemble weights can yield significant improvements in ensemble generalization performance, in particular if the individual students are subject to noise in the training process. Choosing students with a wide range of regularization parameters makes this improvement robust against changes in the unknown level of noise in the training data. 1 INTRODUCTION An ensemble is a collection of a (finite) number of neural networks or other types of predictors that are trained for the same task.


Rapid Quality Estimation of Neural Network Input Representations

Neural Information Processing Systems

However, ANNs are usually costly to train, preventing one from trying many different representations. In this paper, we address this problem by introducing and evaluating three new measures for quickly estimating ANN input representation quality. Two of these, called [DBleaves and Min (leaves), consistently outperform Rendell and Ragavan's (1993) blurring measure in accurately ranking different input representations for ANN learning on three difficult, real-world datasets.


Quadratic-Type Lyapunov Functions for Competitive Neural Networks with Different Time-Scales

Neural Information Processing Systems

The dynamics of complex neural networks modelling the selforganization process in cortical maps must include the aspects of long and short-term memory. The behaviour of the network is such characterized by an equation of neural activity as a fast phenomenon and an equation of synaptic modification as a slow part of the neural system. We present a quadratic-type Lyapunov function for the flow of a competitive neural system with fast and slow dynamic variables. We also show the consequences of the stability analysis on the neural net parameters. 1 INTRODUCTION This paper investigates a special class of laterally inhibited neural networks. In particular, we have examined the dynamics of a restricted class of laterally inhibited neural networks from a rigorous analytic standpoint.


Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?

Neural Information Processing Systems

A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in order to obtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the generalization error. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.