AITopics

This paper describes some of the interactions of model learning algorithms and planning algorithms we have found in exploring model-based reinforcement learning. The paper focuses on how local trajectoryoptimizers can be used effectively with learned nonparametric models.We find that trajectory planners that are fully consistent with the learned model often have difficulty finding reasonable plansin the early stages of learning. Trajectory planners that balance obeying the learned model with minimizing cost (or maximizing reward) often do better, even if the plan is not fully consistent with the learned model. 1 INTRODUCTION We are exploring the use of nonparametric models in robot learning (Atkeson et al., 1997b; Atkeson and Schaal, 1997). This paper describes the interaction of model learning algorithms and planning algorithms, focusing on how local trajectory optimization canbe used effectively with nonparametric models in reinforcement learning. We find that trajectory optimizers that are fully consistent with the learned model often have difficulty finding reasonable plans in the early stages of learning. The message of this paper is that a planner should not be entirely consistent with the learned model during model-based reinforcement learning.

artificial intelligence, optimization problem, trajectory, (17 more...)

Country: North America > United States (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.72)

The Asymptotic Convergence-Rate of Q-learning

Szepesvári, Csaba

R Pmin/Pmax is the ratio of the minimum and maximum state-action occupation frequencies.

algorithm, artificial intelligence, reinforcement learning, (14 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Munos, Rémi, Bourgine, Paul

Reinforcement Learning for Continuous Stochastic Control Problems

Here we sudy the continuous time, continuous state-spacestochastic case, which covers a wide variety of control problems including target, viability, optimization problems (see [FS93], [KP95])}or which a formalism is the following.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Country: Europe (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Enhancing Q-Learning for Optimal Asset Allocation

Neuneier, Ralph

This paper enhances the Q-Iearning algorithm for optimal asset allocation proposedin (Neuneier, 1996 [6]). The new formulation simplifies the approach by using only one value-function for many assets and allows model-freepolicy-iteration. After testing the new algorithm on real data, the possibility of risk management within the framework of Markov decision problems is analyzed. The proposed methods allows the construction of a multi-period portfolio management system which takes into account transaction costs, the risk preferences of the investor, and several constraints on the allocation. 1 Introduction

artificial intelligence, banking & finance, investor, (20 more...)

Country: Europe > Germany (0.14)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Monaco, Jeffrey F., Ward, David G., Barto, Andrew G.

Automated Aircraft Recovery via Reinforcement Learning: Initial Experiments

An emerging use of reinforcement learning (RL) is to approximate optimal policies for large-scale control problems through extensive simulated control experience. Described here are initial experiments directed toward the development of an automated recovery system (ARS)for high-agility aircraft. An ARS is an outer-loop flight control system designed to bring the aircraft from a range of initial states to straight, level, and non-inverted flight in minimum time while satisfying constraints such as maintaining altitude and accelerations within acceptable limits. Here we describe the problem and present initial results involving only single-axis (pitch) recoveries. Through extensive simulated control experience using a medium-fidelity simulation of an F-16, the RL system approximated an optimal policy for longitudinal-stick inputs to produce near-minimum-time transitions to straight and level flight in unconstrained cases, as well as while meeting a pilot-station acceleration constraint. 2 AIRCRAFT MODEL

air transportation, artificial intelligence, rl system, (14 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Industry: Transportation > Air (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Yamada, Satoshi, Watanabe, Akira, Nakashima, Michio

Hybrid Reinforcement Learning and Its Application to Biped Robot Control

Advanced Technology R&D Center Mitsubishi Electric Corporation Amagasaki, Hyogo 661-0001, Japan Abstract A learning system composed of linear control modules, reinforcement learningmodules and selection modules (a hybrid reinforcement learning system) is proposed for the fast learning of real-world control problems. The selection modules choose one appropriate control module dependent on the state. It learned the control on a sloped floor more quickly than the usual reinforcement learningbecause it did not need to learn the control on a flat floor, where the linear control module can control the robot. When it was trained by a 2-step learning (during the first learning step, the selection module was trained by a training procedure controlled onlyby the linear controller), it learned the control more quickly. The average number of trials (about 50) is so small that the learning system is applicable to real robot control. 1 Introduction Reinforcement learning has the ability to solve general control problems because it learns behavior through trial-and-error interactions with a dynamic environment.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country: Asia > Japan (0.24)

Industry: Automobiles & Trucks > Manufacturer (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Foster, David J., Morris, Richard G. M., Dayan, Peter

Hippocampal Model of Rat Spatial Abilities Using Temporal Difference Learning

Peter Dayan E25-210, MIT Cambridge, MA 02139 We provide a model of the standard watermaze task, and of a more challenging task involving novel platform locations, in which rats exhibit one-trial learning after a few days of training. The model uses hippocampal place cells to support reinforcement learning, and also, in an integrated manner, to build and use allocentric coordinates. 1 INTRODUCTION

artificial intelligence, platform, reinforcement learning, (13 more...)

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.24)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Adaptive Choice of Grid and Time in Reinforcement Learning

Pareigis, Stephan

Weconsider a deterministic system with continuous state and time with infinite horizon discounted cost functional.

artificial intelligence, optimal value function, reinforcement learning, (16 more...)

Country: Europe > Germany (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Adaptive Choice of Grid and Time in Reinforcement Learning

Pareigis, Stephan

Consistency problems arise if the discretization needs to be refined, e.g. for more accuracy, application of multi-grid iteration or better starting values for the iteration of the approximate optimal value function. In [7] it was shown, that for diffusion dominated problems, a state to time discretization ratio k/ h of Ch'r, I

artificial intelligence, optimal value function, reinforcement learning, (16 more...)

Country: Europe > Germany (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.44)

Marbach, Peter, Mihatsch, Oliver, Schulte, Miriam, Tsitsiklis, John N.

Reinforcement Learning for Call Admission Control and Routing in Integrated Service Networks

We provide a model of the standard watermaze task, and of a more challenging task involving novel platform locations, in which rats exhibit one-trial learning after a few days of training. The model uses hippocampal place cells to support reinforcement learning, and also, in an integrated manner, to build and use allocentric coordinates. 1 INTRODUCTION

artificial intelligence, call admission control, télécommunications, (13 more...)