Goto

Collaborating Authors

 Reinforcement Learning


Hippocampal Model of Rat Spatial Abilities Using Temporal Difference Learning

Neural Information Processing Systems

Peter Dayan E25-210, MIT Cambridge, MA 02139 We provide a model of the standard watermaze task, and of a more challenging task involving novel platform locations, in which rats exhibit one-trial learning after a few days of training. The model uses hippocampal place cells to support reinforcement learning, and also, in an integrated manner, to build and use allocentric coordinates. 1 INTRODUCTION


Reinforcement Learning for Call Admission Control and Routing in Integrated Service Networks

Neural Information Processing Systems

Peter Dayan E25-210, MIT Cambridge, MA 02139 We provide a model of the standard watermaze task, and of a more challenging task involving novel platform locations, in which rats exhibit one-trial learning after a few days of training. The model uses hippocampal place cells to support reinforcement learning, and also, in an integrated manner, to build and use allocentric coordinates. 1 INTRODUCTION


Hybrid Reinforcement Learning and Its Application to Biped Robot Control

Neural Information Processing Systems

Advanced Technology R&D Center Mitsubishi Electric Corporation Amagasaki, Hyogo 661-0001, Japan Abstract A learning system composed of linear control modules, reinforcement learningmodules and selection modules (a hybrid reinforcement learning system) is proposed for the fast learning of real-world control problems. The selection modules choose one appropriate control module dependent on the state. It learned the control on a sloped floor more quickly than the usual reinforcement learningbecause it did not need to learn the control on a flat floor, where the linear control module can control the robot. When it was trained by a 2-step learning (during the first learning step, the selection module was trained by a training procedure controlled onlyby the linear controller), it learned the control more quickly. The average number of trials (about 50) is so small that the learning system is applicable to real robot control. 1 Introduction Reinforcement learning has the ability to solve general control problems because it learns behavior through trial-and-error interactions with a dynamic environment.



Multi-time Models for Temporally Abstract Planning

Neural Information Processing Systems

The Natural abstract actions are to move from room to room. 1 Reinforcement Learning (MDP) Framework In reinforcement learning, a learning agent interacts with an environment at some discrete, lowest-level time scale t 0,1,2, ... On each time step, the agent perceives the state of the environment, St, and on that basis chooses a primitive action, at.


Reinforcement Learning with Hierarchies of Machines

Neural Information Processing Systems

We present a new approach to reinforcement learning in which the policies consideredby the learning process are constrained by hierarchies of partially specified machines. This allows for the use of prior knowledge to reduce the search space and provides a framework in which knowledge can be transferred across problems and in which component solutions can be recombined to solve larger and more complicated problems. Our approach can be seen as providing a link between reinforcement learning and"behavior-based" or "teleo-reactive" approaches to control. We present provably convergent algorithms for problem-solving and learning withhierarchical machines and demonstrate their effectiveness on a problem with several thousand states.



Reinforcement Learning for Continuous Stochastic Control Problems

Neural Information Processing Systems

Here we sudy the continuous time, continuous state-spacestochastic case, which covers a wide variety of control problems including target, viability, optimization problems (see [FS93], [KP95])}or which a formalism is the following.


Automated Aircraft Recovery via Reinforcement Learning: Initial Experiments

Neural Information Processing Systems

An emerging use of reinforcement learning (RL) is to approximate optimal policies for large-scale control problems through extensive simulated control experience. Described here are initial experiments directed toward the development of an automated recovery system (ARS)for high-agility aircraft. An ARS is an outer-loop flight control system designed to bring the aircraft from a range of initial states to straight, level, and non-inverted flight in minimum time while satisfying constraints such as maintaining altitude and accelerations within acceptable limits. Here we describe the problem and present initial results involving only single-axis (pitch) recoveries. Through extensive simulated control experience using a medium-fidelity simulation of an F-16, the RL system approximated an optimal policy for longitudinal-stick inputs to produce near-minimum-time transitions to straight and level flight in unconstrained cases, as well as while meeting a pilot-station acceleration constraint. 2 AIRCRAFT MODEL


Nonparametric Model-Based Reinforcement Learning

Neural Information Processing Systems

This paper describes some of the interactions of model learning algorithms and planning algorithms we have found in exploring model-based reinforcement learning. The paper focuses on how local trajectoryoptimizers can be used effectively with learned nonparametric models.We find that trajectory planners that are fully consistent with the learned model often have difficulty finding reasonable plansin the early stages of learning. Trajectory planners that balance obeying the learned model with minimizing cost (or maximizing reward) often do better, even if the plan is not fully consistent with the learned model. 1 INTRODUCTION We are exploring the use of nonparametric models in robot learning (Atkeson et al., 1997b; Atkeson and Schaal, 1997). This paper describes the interaction of model learning algorithms and planning algorithms, focusing on how local trajectory optimization canbe used effectively with nonparametric models in reinforcement learning. We find that trajectory optimizers that are fully consistent with the learned model often have difficulty finding reasonable plans in the early stages of learning. The message of this paper is that a planner should not be entirely consistent with the learned model during model-based reinforcement learning.