Asia
Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?
Amari, Shun-ichi, Murata, Noboru, Müller, Klaus-Robert, Finke, Michael, Yang, Howard Hua
A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, evenif we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in order toobtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the generalization error.Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.
Constructive Algorithms for Hierarchical Mixtures of Experts
Waterhouse, Steve R., Robinson, Anthony J.
By applying a likelihood splitting criteria to each expert in the HME we "grow" the tree adaptively during training. Secondly,by considering only the most probable path through the tree we may "prune" branches away, either temporarily, or permanently ifthey become redundant. We demonstrate results for the growing and path pruning algorithms which show significant speed ups and more efficient use of parameters over the standard fixed structure in discriminating between two interlocking spirals and classifying 8-bit parity patterns. INTRODUCTION The HME (Jordan & Jacobs 1994) is a tree structured network whose terminal nodes are simple function approximators in the case of regression or classifiers in the case of classification. The outputs of the terminal nodes or experts are recursively combined upwards towards the root node, to form the overall output of the network, by "gates" which are situated at the non-terminal nodes.
A Dynamical Systems Approach for a Learnable Autonomous Robot
This paper discusses how a robot can learn goal-directed navigation tasksusing local sensory inputs. The emphasis is that such learning tasks could be formulated as an embedding problem of dynamical systems: desired trajectories in a task space should be embedded into an adequate sensory-based internal state space so that an unique mapping from the internal state space to the motor command could be established. The paper shows that a recurrent neural network suffices in self-organizing such an adequate internal state space from the temporal sensory input.
Selective Attention for Handwritten Digit Recognition
Completely parallel object recognition is NPcomplete. Achieving a recognizer with feasible complexity requires a compromise between paralleland sequential processing where a system selectively focuses on parts of a given image, one after another. Successive fixations are generated to sample the image and these samples are processed and abstracted to generate a temporal context in which results are integrated over time. A computational model based on a partially recurrent feedforward network is proposed and made credible bytesting on the real-world problem of recognition of handwritten digitswith encouraging results.
A New Learning Algorithm for Blind Signal Separation
Amari, Shun-ichi, Cichocki, Andrzej, Yang, Howard Hua
A new online learning algorithm which minimizes a statistical dependency amongoutputs is derived for blind separation of mixed signals. The dependency is measured by the average mutual information (MI)of the outputs. The source signals and the mixing matrix are unknown except for the number of the sources. The Gram-Charlier expansion instead of the Edgeworth expansion is used in evaluating the MI. The natural gradient approach is used to minimize the MI. A novel activation function is proposed for the online learning algorithm which has an equivariant property and is easily implemented on a neural network like model. The validity of the new learning algorithm are verified by computer simulations.
Neuron-MOS Temporal Winner Search Hardware for Fully-Parallel Data Processing
Shibata, Tadashi, Nakai, Tsutomu, Morimoto, Tatsuo, Kaihara, Ryu, Yamashita, Takeo, Ohmi, Tadahiro
Search for the largest (or the smallest) among a number of input data, Le., the winner-take-all (WTA) action, is an essential part of intelligent data processing such as data retrieval in associative memories [3], vector quantization circuits [4], Kohonen's self-organizing maps [5] etc. In addition to the maximum or minimum search, data sorting also plays an essential role in a number of signal processing such as median filtering in image processing, evolutionary algorithms in optimizing problems [6] and so forth.
A Unified Learning Scheme: Bayesian-Kullback Ying-Yang Machine
A Bayesian-Kullback learning scheme, called Ying-Yang Machine, is proposed based on the two complement but equivalent Bayesian representations for joint density and their Kullback divergence. Not only the scheme unifies existing major supervised and unsupervised learnings,including the classical maximum likelihood or least square learning, the maximum information preservation, the EM & em algorithm and information geometry, the recent popular Helmholtz machine, as well as other learning methods with new variants and new results; but also the scheme provides a number of new learning models. 1 INTRODUCTION Many different learning models have been developed in the literature. We may come to an age of searching a unified scheme for them. With a unified scheme, we may understand deeply the existing models and their relationships, which may cause cross-fertilization on them to obtain new results and variants; We may also be guided to develop new learning models, after we get better understanding on which cases we have already studied or missed, which deserve to be further explored. Recently, a Baysian-Kullback scheme, called the YING-YANG Machine, has been proposed as such an effort(Xu, 1995a). It bases on the Kullback divergence and two complement but equivalent Baysian representations for the joint distribution of the input space and the representation space, instead of merely using Kullback divergence formatching un-structuralized joint densities in information geometry type learnings (Amari, 1995a&b; Byrne, 1992; Csiszar, 1975).
Dynamics of Attention as Near Saddle-Node Bifurcation Behavior
Nakahara, Hiroyuki, Doya, Kenji
Most studies of attention have focused on the selection process of incoming sensory cues (Posner et al., 1980; Koch et al., 1985; Desimone et al., 1995). Emphasis was placed on the phenomena of causing different percepts for the same sensory stimuli. However, the selection of sensory input itself is not the final goal of attention. We consider attention as a means for goal-directed behavior and survival of the animal. In this view, dynamical properties of attention are crucial. While attention has to be maintained long enough to enable robust response to sensory input, it also has to be shifted quickly to a novel cue that is potentially important. Long-term maintenance and quick transition are critical requirements for attention dynamics.