Asia
Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?
Amari, Shun-ichi, Murata, Noboru, Mรผller, Klaus-Robert, Finke, Michael, Yang, Howard Hua
A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in order to obtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the generalization error. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.
Plasticity of Center-Surround Opponent Receptive Fields in Real and Artificial Neural Systems of Vision
Yasui, S., Furukawa, T., Yamada, M., Saito, T.
The center-surround opponent receptive field (CSRF) mechanism represents one such example. Here, analogous CSRFs are shown to be formed in an artificial neural network which learns to localize contours (edges) of the luminance difference. Furthermore, when the input pattern is corrupted by a background noise, the CSRFs of the hidden units becomes shallower and broader with decrease of the signal-to-noise ratio (SNR). The same kind of SNR-dependent plasticity is present in the CSRF of real visual neurons; in bipolar cells of the carp retina as is shown here experimentally, as well as in large monopolar cells of the fly compound eye as was described by others. Also, analogous SNRdependent plasticity is shown to be present in the biphasic flash responses (BPFR) of these artificial and biological visual systems. Thus, the spatial (CSRF) and temporal (BPFR) filtering properties with which a wide variety of creatures see the world appear to be optimized for detectability of changes in space and time. 1 INTRODUCTION A number of learning algorithms have been developed to make synthetic neural machines be trainable to function in certain optimal ways. If the brain and nervous systems that we see in nature are best answers of the evolutionary process, then one might be able to find some common'softwares' in real and artificial neural systems. This possibility is examined in this paper, with respect to a basic visual 160 S. Y ASUI, T. FURUKAWA, M. YAMADA, T. SAITO
The Geometry of Eye Rotations and Listing's Law
Handzel, Amir A., Flash, Tamar
Various parameterizations of rotations are related through a unifying mathematical treatment, and transformations between coordinate systems are computed using the Campbell-Baker Hausdorff formula. Next, we describe Listing's law by means of the Lie algebra so(3). This enables us to demonstrate a direct connection to Donders' law, by showing that eye orientations are restricted to the quotient space 80(3)/80(2). The latter is equivalent to the sphere S2, which is exactly the space of gaze directions. Our analysis provides a mathematical framework for studying the oculomotor system and could also be extended to investigate the geometry of mUlti-joint arm movements.
Dynamics of Attention as Near Saddle-Node Bifurcation Behavior
Nakahara, Hiroyuki, Doya, Kenji
Most studies of attention have focused on the selection process of incoming sensory cues (Posner et al., 1980; Koch et al., 1985; Desimone et al., 1995). Emphasis was placed on the phenomena of causing different percepts for the same sensory stimuli. However, the selection of sensory input itself is not the final goal of attention. We consider attention as a means for goal-directed behavior and survival of the animal. In this view, dynamical properties of attention are crucial. While attention has to be maintained long enough to enable robust response to sensory input, it also has to be shifted quickly to a novel cue that is potentially important. Long-term maintenance and quick transition are critical requirements for attention dynamics.
Temporal Difference Learning in Continuous Time and Space
Elucidation of the relationship between TD learning and dynamic programming (DP) has provided good theoretical insights (Barto et al., 1995). However, conventional TD algorithms were based on discrete-time, discrete-state formulations. In applying these algorithms to control problems, time, space and action had to be appropriately discretized using a priori knowledge or by trial and error. Furthermore, when a TD algorithm is used for neurobiological modeling, discrete-time operation is often very unnatural. There have been several attempts to extend TD-like algorithms to continuous cases. Bradtke et al. (1994) showed convergence results for DPbased algorithms for a discrete-time, continuous-state linear system with a quadratic cost. Bradtke and Duff (1995) derived TD-like algorithms for continuous-time, discrete-state systems (semi-Markov decision problems). Baird (1993) proposed the "advantage updating" algorithm by modifying Q-Iearning so that it works with arbitrary small time steps.
Improving Policies without Measuring Merits
Dayan, Peter, Singh, Satinder P.
Performing policy iteration in dynamic programming should only require knowledge of relative rather than absolute measures of the utility of actions (Werbos, 1991) - what Baird (1993) calls the ad vantages of actions at states. Nevertheless, most existing methods in dynamic programming (including Baird's) compute some form of absolute utility function. For smooth problems, advantages satisfy two differential consistency conditions (including the requirement that they be free of curl), and we show that enforcing these can lead to appropriate policy improvement solely in terms of advantages.
Stable Fitted Reinforcement Learning
We describe the reinforcement learning problem, motivate algorithms which seek an approximation to the Q function, and present new convergence results for two such algorithms. 1 INTRODUCTION AND BACKGROUND Imagine an agent acting in some environment. At time t, the environment is in some state Xt chosen from a finite set of states. The agent perceives Xt, and is allowed to choose an action at from some finite set of actions. Meanwhile, the agent experiences a real-valued cost Ct, chosen from a distribution which also depends only on Xt and at and which has finite mean and variance. Such an environment is called a Markov decision process, or MDP.