Industry
Minimax and Hamiltonian Dynamics of Excitatory-Inhibitory Networks
Seung, H. Sebastian, Richardson, Tom J., Lagarias, J. C., Hopfield, John J.
A Lyapunov function for excitatory-inhibitory networks is constructed. The construction assumes symmetric interactions within excitatory and inhibitory populations of neurons, and antisymmetric interactions between populations.The Lyapunov function yields sufficient conditions for the global asymptotic stability of fixed points. If these conditions are violated, limit cycles may be stable. The relations of the Lyapunov function to optimization theory and classical mechanics are revealed by minimax and dissipative Hamiltonian forms of the network dynamics. The dynamics of a neural network with symmetric interactions provably converges to fixed points under very general assumptions[l, 2].
Hybrid Reinforcement Learning and Its Application to Biped Robot Control
Yamada, Satoshi, Watanabe, Akira, Nakashima, Michio
Advanced Technology R&D Center Mitsubishi Electric Corporation Amagasaki, Hyogo 661-0001, Japan Abstract A learning system composed of linear control modules, reinforcement learningmodules and selection modules (a hybrid reinforcement learning system) is proposed for the fast learning of real-world control problems. The selection modules choose one appropriate control module dependent on the state. It learned the control on a sloped floor more quickly than the usual reinforcement learningbecause it did not need to learn the control on a flat floor, where the linear control module can control the robot. When it was trained by a 2-step learning (during the first learning step, the selection module was trained by a training procedure controlled onlyby the linear controller), it learned the control more quickly. The average number of trials (about 50) is so small that the learning system is applicable to real robot control. 1 Introduction Reinforcement learning has the ability to solve general control problems because it learns behavior through trial-and-error interactions with a dynamic environment.
Automated Aircraft Recovery via Reinforcement Learning: Initial Experiments
Monaco, Jeffrey F., Ward, David G., Barto, Andrew G.
An emerging use of reinforcement learning (RL) is to approximate optimal policies for large-scale control problems through extensive simulated control experience. Described here are initial experiments directed toward the development of an automated recovery system (ARS)for high-agility aircraft. An ARS is an outer-loop flight control system designed to bring the aircraft from a range of initial states to straight, level, and non-inverted flight in minimum time while satisfying constraints such as maintaining altitude and accelerations within acceptable limits. Here we describe the problem and present initial results involving only single-axis (pitch) recoveries. Through extensive simulated control experience using a medium-fidelity simulation of an F-16, the RL system approximated an optimal policy for longitudinal-stick inputs to produce near-minimum-time transitions to straight and level flight in unconstrained cases, as well as while meeting a pilot-station acceleration constraint. 2 AIRCRAFT MODEL
An Improved Policy Iteration Algorithm for Partially Observable MDPs
A new policy iteration algorithm for partially observable Markov decision processes is presented that is simpler and more efficient than an earlier policy iteration algorithm of Sondik (1971,1978). The key simplification is representation of a policy as a finite-state controller. This representation makes policy evaluation straightforward. The paper's contributionis to show that the dynamic-programming update used in the policy improvement step can be interpreted as the transformation ofa finite-state controller into an improved finite-state controller. The new algorithm consistently outperforms value iteration as an approach to solving infinite-horizon problems.
Modelling Seasonality and Trends in Daily Rainfall Data
Peter M Williams School of Cognitive and Computing Sciences University of Sussex Falmer, Brighton BN1 9QH, UK. email: peterw@cogs.susx.ac.uk Abstract This paper presents a new approach to the problem of modelling daily rainfall using neural networks. We first model the conditional distributions ofrainfall amounts, in such a way that the model itself determines the order of the process, and the time-dependent shape and scale of the conditional distributions. After integrating over particular weather patterns, weare able to extract seasonal variations and long-term trends. 1 Introduction Analysis of rainfall data is important for many agricultural, ecological and engineering activities. Design of irrigation and drainage systems, for instance, needs to take account not only of mean expected rainfall, but also of rainfall volatility. Estimates of crop yields also depend on the distribution of rainfall during the growing season, as well as on the overall amount.
A Solution for Missing Data in Recurrent Neural Networks with an Application to Blood Glucose Prediction
Tresp, Volker, Briegel, Thomas
Volker Tresp and Thomas Briegel * Siemens AG Corporate Technology Otto-Hahn-Ring 6 81730 Miinchen, Germany Abstract We consider neural network models for stochastic nonlinear dynamical systems where measurements of the variable of interest are only available atirregular intervals i.e. most realizations are missing. Difficulties arise since the solutions for prediction and maximum likelihood learning withmissing data lead to complex integrals, which even for simple cases cannot be solved analytically. In this paper we propose a specific combinationof a nonlinear recurrent neural predictive model and a linear error model which leads to tractable prediction and maximum likelihood adaptation rules. In particular, the recurrent neural network can be trained using the real-time recurrent learning rule and the linear error model can be trained by an EM adaptation rule, implemented using forward-backwardKalman filter equations. The model is applied to predict the glucose/insulin metabolism of a diabetic patient where blood glucose measurements are only available a few times a day at irregular intervals.
Experiences with Bayesian Learning in a Real World Application
Sykacek, Peter, Dorffner, Georg, Rappelsberger, Peter, Zeitlhofer, Josef
Sleep staging is usually based on rules defined by Rechtschaffen and Kales (see [8]). Rechtschaffen and Kales rules define 4 sleep stages, stage one to four, as well as rapid eye movement (REM) and wakefulness. In [1] J. Bentrup and S. Ray report that every year nearly one million US citizens consulted their physicians concerning their sleep. Since sleep staging is a tedious task (one all night recording on average takes abou t 3 hours to score manually), much effort was spent in designing automatic sleep stagers. Sleep staging is a classification problem which was solved using classical statistical t.echniques or techniques emerged from the field of artificial intelligence (AI) . Among classical techniques especially the k nearest neighbor technique was used. In [1] J. Bentrup and S. Ray report that the classical technique outperformed their AI approaches. Among techniques from the field of AI, researchers used inductive learning to build tree based classifiers (e.g.
Bach in a Box - Real-Time Harmony
Spangler, Randall R., Goodman, Rodney M., Hawkins, Jim
The learning and inferencing algorithms presented here speak an extended form of the classical figured bass representation common in Bach's time. Paired with a melody, figured bass provides a sufficient amount of information to reconstruct the harmonic content of a piece of music. Figured bass has several characteristics which make it well-disposed to learning rules. It is a symbolic format which uses a relatively small alphabet of symbols. It is also hierarchical - it specifies first the chord function that is to be played at the current note/timestep, then the scale step to be played by the bass voice, then additional information as needed to specify the alto and tenor scale steps. This allows our algorithm to fire sets of rules sequentially, to first determine the chord function which should be associated with a new melody note, and then to use that chord function as an input attribute to subsequent rulebases which determine the bass, alto, and tenor scale steps. In this way we can build up the final chord from simpler pieces, each governed by a specialized rulebase.