AITopics

Country:

North America > United States > California > San Francisco County > San Francisco (0.30)
Europe > Hungary > Budapest > Budapest (0.24)
Europe > Hungary > Csongrád-Csanád County > Szeged (0.05)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

The Asymptotic Convergence-Rate of Q-learning

Szepesvári, Csaba

Q-Iearning is a popular reinforcement learning (RL) algorithm whose convergence is well demonstrated in the literature (Jaakkola et al., 1994; Tsitsiklis, 1994; Littman and Szepesvari, 1996; Szepesvari and Littman, 1996). Our aim in this paper is to provide an upper bound for the convergence rate of (lookup-table based) Q-Iearning algorithms. Although, this upper bound is not strict, computer experiments (to be presented elsewhere) and the form of the lemma underlying the proof indicate that the obtained upper bound can be made strict by a slightly more complicated definition for R. Our results extend to learning on aggregated states (see (Singh et al., 1995» and other related algorithms which admit a certain form of asynchronous stochastic approximation (see (Szepesv iri and Littman, 1996». Present address: Associative Computing, Inc., Budapest, Konkoly Thege M. u. 29-33, HUNGARY-1121 The Asymptotic Convergence-Rate of Q-leaming

algorithm, asymptotic convergence-rate, convergence rate, (13 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.30)
Europe > Hungary > Budapest > Budapest (0.24)
Europe > Hungary > Csongrád-Csanád County > Szeged (0.05)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Monaco, Jeffrey F., Ward, David G., Barto, Andrew G.

Automated Aircraft Recovery via Reinforcement Learning: Initial Experiments

An emerging use of reinforcement learning (RL) is to approximate optimal policies for large-scale control problems through extensive simulated control experience. Described here are initial experiments directed toward the development of an automated recovery system (ARS) for high-agility aircraft. An ARS is an outer-loop flight control system designed to bring the aircraft from a range of initial states to straight, level, and non-inverted flight in minimum time while satisfying constraints such as maintaining altitude and accelerations within acceptable limits. Here we describe the problem and present initial results involving only single-axis (pitch) recoveries. Through extensive simulated control experience using a medium-fidelity simulation of an F-16, the RL system approximated an optimal policy for longitudinal-stick inputs to produce near-minimum-time transitions to straight and level flight in unconstrained cases, as well as while meeting a pilot-station acceleration constraint. 2 AIRCRAFT MODEL

aircraft, initial condition, rl system, (13 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Monaco (0.06)
(3 more...)

Industry:

Transportation > Air (0.38)
Aerospace & Defense (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Precup, Doina, Sutton, Richard S.

Multi-time Models for Temporally Abstract Planning

Planning and learning at multiple levels of temporal abstraction is a key problem for artificial intelligence. In this paper we summarize an approach to this problem based on the mathematical framework of Markov decision processes and reinforcement learning. Current model-based reinforcement learning is based on one-step models that cannot represent commonsense higher-level actions, such as going to lunch, grasping an object, or flying to Denver. This paper generalizes prior work on temporally abstract models [Sutton, 1995] and extends it from the prediction setting to include actions, control, and planning. We introduce a more general form of temporally abstract model, the multi-time model, and establish its suitability for planning and learning by virtue of its relationship to the Bellman equations. This paper summarizes the theoretical framework of multi-time models and illustrates their potential advantages in a grid world planning task.

abstract action, primitive action, sutton, (12 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)

Parr, Ronald, Russell, Stuart J.

Reinforcement Learning with Hierarchies of Machines

We present a new approach to reinforcement learning in which the policies considered by the learning process are constrained by hierarchies of partially specified machines. This allows for the use of prior knowledge to reduce the search space and provides a framework in which knowledge can be transferred across problems and in which component solutions can be recombined to solve larger and more complicated problems. Our approach can be seen as providing a link between reinforcement learning and "behavior-based" or "teleo-reactive" approaches to control. We present provably convergent algorithms for problem-solving and learning with hierarchical machines and demonstrate their effectiveness on a problem with several thousand states.

choice point, optimal policy, reinforcement learning, (13 more...)

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(9 more...)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Adaptive Choice of Grid and Time in Reinforcement Learning

Pareigis, Stephan

Consistency problems arise if the discretization needs to be refined, e.g. for more accuracy, application of multi-grid iteration or better starting values for the iteration of the approximate optimal value function. In [7] it was shown, that for diffusion dominated problems, a state to time discretization ratio k/ h of Ch'r, I

discretization, optimal value function, refinement, (16 more...)

Country: Europe > Germany > Schleswig-Holstein > Kiel (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.44)

Munos, Rémi, Bourgine, Paul

Reinforcement Learning for Continuous Stochastic Control Problems

Here we sudy the continuous time, continuous state-space stochastic case, which covers a wide variety of control problems including target, viability, optimization problems (see [FS93], [KP95])}or which a formalism is the following.

algorithm, equation, reinforcement learning, (11 more...)

Country:

Europe > France (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Nonparametric Model-Based Reinforcement Learning

Atkeson, Christopher G.

This paper describes some of the interactions of model learning algorithms and planning algorithms we have found in exploring model-based reinforcement learning. The paper focuses on how local trajectory optimizers can be used effectively with learned nonparametric models. We find that trajectory planners that are fully consistent with the learned model often have difficulty finding reasonable plans in the early stages of learning. Trajectory planners that balance obeying the learned model with minimizing cost (or maximizing reward) often do better, even if the plan is not fully consistent with the learned model.

dynamic programming, trajectory, value function, (13 more...)

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.75)

Enhancing Q-Learning for Optimal Asset Allocation

Neuneier, Ralph

This paper enhances the Q-Iearning algorithm for optimal asset allocation proposed in (Neuneier, 1996 [6]). The new formulation simplifies the approach by using only one value-function for many assets and allows model-free policy-iteration. After testing the new algorithm on real data, the possibility of risk management within the framework of Markov decision problems is analyzed. The proposed methods allows the construction of a multi-period portfolio management system which takes into account transaction costs, the risk preferences of the investor, and several constraints on the allocation. 1 Introduction

allocation, investor, transaction cost, (16 more...)

Country:

North America > United States > Massachusetts (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Asia > China > Hong Kong (0.04)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Marbach, Peter, Mihatsch, Oliver, Schulte, Miriam, Tsitsiklis, John N.

Reinforcement Learning for Call Admission Control and Routing in Integrated Service Networks

We provide a model of the standard watermaze task, and of a more challenging task involving novel platform locations, in which rats exhibit one-trial learning after a few days of training. The model uses hippocampal place cells to support reinforcement learning, and also, in an integrated manner, to build and use allocentric coordinates. 1 INTRODUCTION

call admission control, learning, service type, (11 more...)