Not enough data to create a plot.
Try a different view from the menu above.
Generalized Learning Vector Quantization
We propose a new learning method, "Generalized Learning Vector Quantization (GLVQ)," in which reference vectors are updated based on the steepest descent method in order to minimize the cost function. The cost function is determined so that the obtained learning rule satisfies the convergence condition. We prove that Kohonen's rule as used in LVQ does not satisfy the convergence condition and thus degrades recognition ability. Experimental results for printed Chinese character recognition reveal that GLVQ is superior to LVQ in recognition ability.
Recursive Estimation of Dynamic Modular RBF Networks
Kadirkamanathan, Visakan, Kadirkamanathan, Maha
In this paper, recursive estimation algorithms for dynamic modular networks are developed. The models are based on Gaussian RBF networks and the gating network is considered in two stages: At first, it is simply a time-varying scalar and in the second, it is based on the state, as in the mixture of local experts scheme. The resulting algorithm uses Kalman filter estimation for the model estimation and the gating probability estimation. Both, 'hard' and'soft' competition based estimation schemes are developed where in the former, the most probable network is adapted and in the latter all networks are adapted by appropriate weighting of the data. 1 INTRODUCTION The problem of learning multiple modes in a complex nonlinear system is increasingly being studied by various researchers [2, 3, 4, 5, 6], The use of a mixture of local experts [5, 6], and a conditional mixture density network [3] have been developed to model various modes of a system. The development has mainly been on model estimation from a given set of block data, with the model likelihood dependent on the input to the networks.
Dynamics of Attention as Near Saddle-Node Bifurcation Behavior
Nakahara, Hiroyuki, Doya, Kenji
Most studies of attention have focused on the selection process of incoming sensory cues (Posner et al., 1980; Koch et al., 1985; Desimone et al., 1995). Emphasis was placed on the phenomena of causing different percepts for the same sensory stimuli. However, the selection of sensory input itself is not the final goal of attention. We consider attention as a means for goal-directed behavior and survival of the animal. In this view, dynamical properties of attention are crucial. While attention has to be maintained long enough to enable robust response to sensory input, it also has to be shifted quickly to a novel cue that is potentially important. Long-term maintenance and quick transition are critical requirements for attention dynamics.
Silicon Models for Auditory Scene Analysis
Lazzaro, John, Wawrzynek, John
We are developing special-purpose, low-power analog-to-digital converters for speech and music applications, that feature analog circuit models of biological audition to process the audio signal before conversion. This paper describes our most recent converter design, and a working system that uses several copies ofthe chip to compute multiple representations of sound from an analog input. This multi-representation system demonstrates the plausibility of inexpensively implementing an auditory scene analysis approach to sound processing. 1. INTRODUCTION The visual system computes multiple representations of the retinal image, such as motion, orientation, and stereopsis, as an early step in scene analysis. Likewise, the auditory brainstem computes secondary representations of sound, emphasizing properties such as binaural disparity, periodicity, and temporal onsets. Recent research in auditory scene analysis involves using computational models of these auditory brainstem representations in engineering applications. Computation is a major limitation in auditory scene analysis research: the complete auditory processing system described in (Brown and Cooke, 1994) operates at approximately 4000 times real time, running under UNIX on a Sun SPARCstation 1. Standard approaches to hardware acceleration for signal processing algorithms could be used to ease this computational burden in a research environment; a variety of parallel, fixed-point hardware products would work well on these algorithms.
Generating Accurate and Diverse Members of a Neural-Network Ensemble
Opitz, David W., Shavlik, Jude W.
In particular, combining separately trained neural networks (commonly referred to as a neural-network ensemble) has been demonstrated to be particularly successful (Alpaydin, 1993; Drucker et al., 1994; Hansen and Salamon, 1990; Hashem et al., 1994; Krogh and Vedelsby, 1995; Maclin and Shavlik, 1995; Perrone, 1992). Both theoretical (Hansen and Salamon, 1990; Krogh and Vedelsby, 1995) and empirical (Hashem et al., 1994; 536 D. W. OPITZ, J. W. SHA VLIK Maclin and Shavlik, 1995) work has shown that a good ensemble is one where the individual networks are both accurate and make their errors on different parts of the input space; however, most previous work has either focussed on combining the output of multiple trained networks or only indirectly addressed how we should generate a good set of networks.
The Capacity of a Bump
Recently, several researchers have reported encouraging experimental results when using Gaussian or bump-like activation functions in multilayer perceptrons. Networks of this type usually require fewer hidden layers and units and often learn much faster than typical sigmoidal networks. To explain these results we consider a hyper-ridge network, which is a simple perceptron with no hidden units and a rid¥e activation function. If we are interested in partitioningp points in d dimensions into two classes then in the limit as d approaches infinity the capacity of a hyper-ridge and a perceptron is identical.
Boosting Decision Trees
Drucker, Harris, Cortes, Corinna
We introduce a constructive, incremental learning system for regression problems that models data by means of locally linear experts. In contrast to other approaches, the experts are trained independently and do not compete for data during learning. Only when a prediction for a query is required do the experts cooperate by blending their individual predictions. Each expert is trained by minimizing a penalized local cross validation error using second order methods. In this way, an expert is able to find a local distance metric by adjusting the size and shape of the receptive field in which its predictions are valid, and also to detect relevant input features by adjusting its bias on the importance of individual input dimensions. We derive asymptotic results for our method. In a variety of simulations the properties of the algorithm are demonstrated with respect to interference, learning speed, prediction accuracy, feature detection, and task oriented incremental learning.