Goto

Collaborating Authors

 Country


Functional Models of Selective Attention and Context Dependency

Neural Information Processing Systems

Scope This workshop reviewed and classified the various models which have emerged from the general concept of selective attention and context dependency, and sought to identify their commonalities. It was concluded that the motivation and mechanism of these functional models are "efficiency" and ''factoring'', respectively. The workshop focused on computational models of selective attention and context dependency within the realm of neural networks. We treated only ''functional'' models; computational models of biological neural systems, and symbolic or rule-based systems were omitted from the discussion. Presentations Thomas H. Hildebrandt presented the results of his recent survey of the literature on functional models of selective attention and context dependency.


How to Describe Neuronal Activity: Spikes, Rates, or Assemblies?

Neural Information Processing Systems

What is the'correct' theoretical description of neuronal activity? The analysis of the dynamics of a globally connected network of spiking neurons (the Spike Response Model) shows that a description by mean firing rates is possible only if active neurons fire incoherently. If firing occurs coherently or with spatiotemporal correlations, the spike structure of the neural code becomes relevant. Alternatively, neurons can be gathered into local or distributed ensembles or'assemblies'. A description based on the mean ensemble activity is, in principle, possible but the interaction between different assemblies becomes highly nonlinear. A description with spikes should therefore be preferred.


Generalization Error and the Expected Network Complexity

Neural Information Processing Systems

E represents the expectation on random number K of hidden units (1:::; I\:::; n). This relationship makes it possible to characterize explicitly how a regularization term affects bias/variance of networks.


Complexity Issues in Neural Computation and Learning

Neural Information Processing Systems

The general goal of this workshop was to bring t.ogether researchers working toward developing a theoretical framework for the analysis and design of neural networks. The t.echnical focus of the workshop was to address recent. The primary topics addressed the following three areas: 1) Computational complexity issues in neural networks, 2) Complexity issues in learning, and 3) Convergence and numerical properties of learning algorit.hms. Such st.udies, in t.urn, have generated considerable research interest. A similar development can be observed in t.he area of learning as well: Techniques primarily developed in the classical theory of learning are being applied to understand t.he generalization and learning characteristics of neural networks.


Fast Non-Linear Dimension Reduction

Neural Information Processing Systems

Dimension reduction provides compact representations for storage, transmission, and classification. Dimension reduction algorithms operate by identifying and eliminating statistical redundancies in the data. The optimal linear technique for dimension reduction is principal component analysis (PCA).


Figure of Merit Training for Detection and Spotting

Neural Information Processing Systems

Spotting tasks require detection of target patterns from a background of richly varied non-target inputs. The performance measure of interest for these tasks, called the figure of merit (FOM), is the detection rate for target patterns when the false alarm rate is in an acceptable range. A new approach to training spotters is presented which computes the FOM gradient for each input pattern and then directly maximizes the FOM using b ackpropagati on. This eliminates the need for thresholds during training. It also uses network resources to model Bayesian a posteriori probability functions accurately only for patterns which have a significant effect on the detection accuracy over the false alarm rate of interest. FOM training increased detection accuracy by 5 percentage points for a hybrid radial basis function (RBF) - hidden Markov model (HMM) wordspotter on the credit-card speech corpus.


Optimal Stopping and Effective Machine Complexity in Learning

Neural Information Processing Systems

We study tltt' problem of when to stop If'arning a class of feedforward networks - networks with linear outputs I1PUrOIl and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there a.re in general three distinct phases in the generalization performance in the learning process, and in particular, the network has hetter gt'neralization pPTformance when learning is stopped at a certain time before til(' global miniIl111lu of the empirical error is reachert. A notion of effective size of a machine is rtefil1e i and used to explain the tradeoff betwf'en the complexity of the marhine and the training error ill the learning process. The study leads nat.urally to a network size selection critt'rion, which turns Ol1t to be a generalization of Akaike's Information Criterioll for the It'arning process. It if; shown that stopping Iparning before tiJt' global minimum of the empirical error has the effect of network size splectioll. 1 INTRODUCTION The primary goal of learning in neural nets is to find a network that gives valid generalization. In achieving this goal, a central issue is the tradeoff between the training error and network complexity. This usually reduces to a problem of network size selection, which has drawn much research effort in recent years. Various principles, theories, and intuitions, including Occam's razor, statistical model selection criteria such as Akaike's Information Criterion (AIC) [11 and many others [5, 1, 10,3,111 all quantitatively support the following PAC prescription: between two machines which have the same empirical error, the machine with smaller VC-dimf'nsion generalizes better. However, it is noted that these methods or criteria do not npcpssarily If'ad to optimal (or llearly optimal) generalization performance.


Catastrophic interference in connectionist networks: Can It Be predicted, can It be prevented?

Neural Information Processing Systems

Catastrophic interference in connectionist networks: Can it be predicted, can it be prevented? Catastrophic forgetting occurs when connectionist networks learn new information, and by so doing, forget all previously learned information. This workshop focused primarily on the causes of catastrophic interference, the techniques that have been developed to reduce it, the effect of these techniques on the networks' ability to generalize, and the degree to which prediction of catastrophic forgetting is possible. The speakers were Robert French, Phil Hetherington (Psychology Department, McGill University, het@blaise.psych.mcgill.ca), French indicated that catastrophic forgetting is at its worst when high representation overlap at the hidden layer combines with significant teacher-output error.


Structured Machine Learning for 'Soft' Classification with Smoothing Spline ANOVA and Stacked Tuning, Testing and Evaluation

Neural Information Processing Systems

We describe the use of smoothing spline analysis of variance (SS ANOVA) in the penalized log likelihood context, for learning (estimating) the probability p of a '1' outcome, given a training set with attribute vectors and outcomes.


Learning Temporal Dependencies in Connectionist Speech Recognition

Neural Information Processing Systems

In this paper, we discuss the nature of the time dependence currently employed in our systems using recurrent networks (RNs) and feed-forward multi-layer perceptrons (MLPs). In particular, we introduce local recurrences into a MLP to produce an enhanced input representation. This is in the form of an adaptive gamma filter and incorporates an automatic approach for learning temporal dependencies. We have experimented on a speakerindependent phone recognition task using the TIMIT database. Results using the gamma filtered input representation have shown improvement over the baseline MLP system. Improvements have also been obtained through merging the baseline and gamma filter models.