AITopics

Peter Dayan E25-210, MIT Cambridge, MA 02139 We provide a model of the standard watermaze task, and of a more challenging task involving novel platform locations, in which rats exhibit one-trial learning after a few days of training. The model uses hippocampal place cells to support reinforcement learning, and also, in an integrated manner, to build and use allocentric coordinates. 1 INTRODUCTION

machine learning, platform, reinforcement learning, (14 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.24)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Marbach, Peter, Mihatsch, Oliver, Schulte, Miriam, Tsitsiklis, John N.

Reinforcement Learning for Call Admission Control and Routing in Integrated Service Networks

call admission control, machine learning, reinforcement learning, (12 more...)

Country:

Europe (0.46)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.35)

Industry: Telecommunications (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Yamada, Satoshi, Watanabe, Akira, Nakashima, Michio

Hybrid Reinforcement Learning and Its Application to Biped Robot Control

Advanced Technology R&D Center Mitsubishi Electric Corporation Amagasaki, Hyogo 661-0001, Japan Abstract A learning system composed of linear control modules, reinforcement learningmodules and selection modules (a hybrid reinforcement learning system) is proposed for the fast learning of real-world control problems. The selection modules choose one appropriate control module dependent on the state. It learned the control on a sloped floor more quickly than the usual reinforcement learningbecause it did not need to learn the control on a flat floor, where the linear control module can control the robot. When it was trained by a 2-step learning (during the first learning step, the selection module was trained by a training procedure controlled onlyby the linear controller), it learned the control more quickly. The average number of trials (about 50) is so small that the learning system is applicable to real robot control. 1 Introduction Reinforcement learning has the ability to solve general control problems because it learns behavior through trial-and-error interactions with a dynamic environment.

machine learning, reinforcement, reinforcement learning, (14 more...)

Country: Asia > Japan (0.24)

Industry: Automobiles & Trucks > Manufacturer (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

The Asymptotic Convergence-Rate of Q-learning

Szepesvári, Csaba

R Pmin/Pmax is the ratio of the minimum and maximum state-action occupation frequencies.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Precup, Doina, Sutton, Richard S.

Multi-time Models for Temporally Abstract Planning

The Natural abstract actions are to move from room to room. 1 Reinforcement Learning (MDP) Framework In reinforcement learning, a learning agent interacts with an environment at some discrete, lowest-level time scale t 0,1,2, ... On each time step, the agent perceives the state of the environment, St, and on that basis chooses a primitive action, at.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Parr, Ronald, Russell, Stuart J.

Reinforcement Learning with Hierarchies of Machines

We present a new approach to reinforcement learning in which the policies consideredby the learning process are constrained by hierarchies of partially specified machines. This allows for the use of prior knowledge to reduce the search space and provides a framework in which knowledge can be transferred across problems and in which component solutions can be recombined to solve larger and more complicated problems. Our approach can be seen as providing a link between reinforcement learning and"behavior-based" or "teleo-reactive" approaches to control. We present provably convergent algorithms for problem-solving and learning withhierarchical machines and demonstrate their effectiveness on a problem with several thousand states.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country: North America > United States > California (0.28)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Adaptive Choice of Grid and Time in Reinforcement Learning

Pareigis, Stephan

Weconsider a deterministic system with continuous state and time with infinite horizon discounted cost functional.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Country: Europe > Germany (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Munos, Rémi, Bourgine, Paul

Reinforcement Learning for Continuous Stochastic Control Problems

Here we sudy the continuous time, continuous state-spacestochastic case, which covers a wide variety of control problems including target, viability, optimization problems (see [FS93], [KP95])}or which a formalism is the following.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Country: Europe (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Monaco, Jeffrey F., Ward, David G., Barto, Andrew G.

Automated Aircraft Recovery via Reinforcement Learning: Initial Experiments

An emerging use of reinforcement learning (RL) is to approximate optimal policies for large-scale control problems through extensive simulated control experience. Described here are initial experiments directed toward the development of an automated recovery system (ARS)for high-agility aircraft. An ARS is an outer-loop flight control system designed to bring the aircraft from a range of initial states to straight, level, and non-inverted flight in minimum time while satisfying constraints such as maintaining altitude and accelerations within acceptable limits. Here we describe the problem and present initial results involving only single-axis (pitch) recoveries. Through extensive simulated control experience using a medium-fidelity simulation of an F-16, the RL system approximated an optimal policy for longitudinal-stick inputs to produce near-minimum-time transitions to straight and level flight in unconstrained cases, as well as while meeting a pilot-station acceleration constraint. 2 AIRCRAFT MODEL

aircraft, machine learning, reinforcement learning, (14 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Industry: Transportation > Air (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Nonparametric Model-Based Reinforcement Learning

Atkeson, Christopher G.

This paper describes some of the interactions of model learning algorithms and planning algorithms we have found in exploring model-based reinforcement learning. The paper focuses on how local trajectoryoptimizers can be used effectively with learned nonparametric models.We find that trajectory planners that are fully consistent with the learned model often have difficulty finding reasonable plansin the early stages of learning. Trajectory planners that balance obeying the learned model with minimizing cost (or maximizing reward) often do better, even if the plan is not fully consistent with the learned model. 1 INTRODUCTION We are exploring the use of nonparametric models in robot learning (Atkeson et al., 1997b; Atkeson and Schaal, 1997). This paper describes the interaction of model learning algorithms and planning algorithms, focusing on how local trajectory optimization canbe used effectively with nonparametric models in reinforcement learning. We find that trajectory optimizers that are fully consistent with the learned model often have difficulty finding reasonable plans in the early stages of learning. The message of this paper is that a planner should not be entirely consistent with the learned model during model-based reinforcement learning.

machine learning, reinforcement learning, trajectory, (16 more...)

Country: North America > United States (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.72)