AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

A Monte Carlo AIXI Approximation

Veness, Joel, Ng, Kee Siong, Hutter, Marcus, Uther, William, Silver, David

arXiv.org Artificial IntelligenceDec-26-2010

This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. Our approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a new Monte-Carlo Tree Search algorithm along with an agent-specific extension to the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a variety of stochastic and partially observable domains. We conclude by proposing a number of directions for future research.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

0909.0801

Country:

Oceania > Australia > New South Wales (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
(2 more...)

Add feedback

Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view

Scherrer, Bruno

arXiv.org Artificial IntelligenceNov-19-2010

We investigate projection methods, for evaluating a linear approximation of the value function of a policy in a Markov Decision Process context. We consider two popular approaches, the one-step Temporal Difference fix-point computation (TD(0)) and the Bellman Residual (BR) minimization. We describe examples, where each method outperforms the other. We highlight a simple relation between the objective function they minimize, and show that while BR enjoys a performance guarantee, TD(0) does not in general. We then propose a unified view in terms of oblique projections of the Bellman equation, which substantially simplifies and extends the characterization of (schoknecht,2002) and the recent analysis of (Yu & Bertsekas, 2008). Eventually, we describe some simulations that suggest that if the TD(0) solution is usually slightly better than the BR solution, its inherent numerical instability makes it very bad in some cases, and thus worse on average.

machine learning, reinforcement learning, space dim, (14 more...)

arXiv.org Artificial Intelligence

1011.4362

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning Based on Active Learning Method

Sagha, Hesam, Shouraki, Saeed Bagheri, Khasteh, Hosein, Kiaei, Ali Akbar

arXiv.org Artificial IntelligenceNov-7-2010

In this paper, a new reinforcement learning approach is proposed which is based on a powerful concept named Active Learning Method (ALM) in modeling. ALM expresses any multi-input-single-output system as a fuzzy combination of some single-input-singleoutput systems. The proposed method is an actor-critic system similar to Generalized Approximate Reasoning based Intelligent Control (GARIC) structure to adapt the ALM by delayed reinforcement signals. Our system uses Temporal Difference (TD) learning to model the behavior of useful actions of a control system. The goodness of an action is modeled on Reward- Penalty-Plane. IDS planes will be updated according to this plane. It is shown that the system can learn with a predefined fuzzy system or without it (through random actions).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1011.166

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.05)
Europe > Spain (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Treating Epilepsy by Reinforcement Learning Via Manifold-Based Simulation

Bush, Keith (University of Arkansas at Little Rock) | Pineau, Joelle ( School of Computer Science McGill University )

AAAI ConferencesNov-5-2010

The ability to take intelligent actions in real-world domains is a goal of great interest in the machine learning community. Unfortunately, the real-world is filled with systems that can bepartially observed but cannot, as yet, be described by first principlemodels. Moreover, the traditional paradigm of direct interaction with the environment used in reinforcement learning (RL) is often prohibitively expensive in practice. An alternative approach that simultaneously solves both of these problems is to gain experience in simulation; the simulation in this approach is a computational model derived from observations. Advances in sensory and information technology are simplifying the acquisition and distribution of real-world datasets to computational scientists; thus, the barrier to linking intelligent control with real-world domains is becoming one of identifying high-quality state-space and transition functions directly from observations. From a dynamical systems perspective, this barrier is analogous to the problem of finding high-quality manifold embeddings and a rich literature of theory and practice exists to address it. The contribution of this work is two-fold. First, we describe an approach for learning optimal control strategies directly from observations using manifold embeddings as the intermediate state representation. Second, we demonstrate how control strategies constructed in this way can answer important scientific questions. As a concrete example, we use our approach to guide experimental decisions in neurostimulation treatments of epilepsy.

machine learning, reinforcement learning, stimulation, (18 more...)

AAAI Conferences

2010 AAAI Fall Symposium Series

Country:

North America > Canada > Quebec > Montreal (0.15)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Genre: Research Report (0.72)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.76)
Health & Medicine > Therapeutic Area > Genetic Disease (0.76)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Kalman Temporal Differences

Geist, M., Pietquin, O.

Journal of Artificial Intelligence ResearchOct-29-2010

Because reinforcement learning suffers from a lack of scalability, online value (and Q-) function approximation has received increasing interest this last decade. This contribution introduces a novel approximation scheme, namely the Kalman Temporal Differences (KTD) framework, that exhibits the following features: sample-efficiency, non-linear approximation, non-stationarity handling and uncertainty management. A first KTD-based algorithm is provided for deterministic Markov Decision Processes (MDP) which produces biased estimates in the case of stochastic transitions. Than the eXtended KTD framework (XKTD), solving stochastic MDP, is described. Convergence is analyzed for special cases for both deterministic and stochastic transitions. Related algorithms are experimented on classical benchmarks. They compare favorably to the state of the art while exhibiting the announced features.

algorithm, equation, value function, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3077

AI Access Foundation

10675

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > New York > New York County > New York City (0.04)
(13 more...)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Optimism in Reinforcement Learning and Kullback-Leibler Divergence

Filippi, Sarah, Cappé, Olivier, Garivier, Aurélien

arXiv.org Machine LearningOct-13-2010

We consider model-based reinforcement learning in finite Markov De- cision Processes (MDPs), focussing on so-called optimistic strategies. In MDPs, optimism can be implemented by carrying out extended value it- erations under a constraint of consistency with the estimated model tran- sition probabilities. The UCRL2 algorithm by Auer, Jaksch and Ortner (2009), which follows this strategy, has recently been shown to guarantee near-optimal regret bounds. In this paper, we strongly argue in favor of using the Kullback-Leibler (KL) divergence for this purpose. By studying the linear maximization problem under KL constraints, we provide an ef- ficient algorithm, termed KL-UCRL, for solving KL-optimistic extended value iteration. Using recent deviation bounds on the KL divergence, we prove that KL-UCRL provides the same guarantees as UCRL2 in terms of regret. However, numerical experiments on classical benchmarks show a significantly improved behavior, particularly when the MDP has reduced connectivity. To support this observation, we provide elements of com- parison between the two algorithms based on geometric considerations.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/ALLERTON.2010.5706896

1004.5229

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

Add feedback

An Automated Model-Based Adaptive Architecture in Modern Games

Tan, Chek Tien (DigiPen Institute of Technology, Singapore) | Cheng, Ho-lun (National University of Singapore)

AAAI ConferencesOct-10-2010

This paper proposes an automatic model-based approach that enables adaptive decision making in modern virtual games. It builds upon the Integrated MDP and POMDP Learning AgeNT (IMPLANT) architecture which has shown to provide plausible adaptive decision making in modern games. However, it suffers from highly time-consuming manual model specification problems. By incorporating an automated priority sweeping based model builder for the MDP, as well as using the Tactical Agent Personality for the POMDP, the work in this paper aims to resolve these problems. Empirical proof of concept is shown based on an implementation in a modern game scenario, whereby the enhanced IMPLANT agent is shown to exhibit superior adaptation performance over the old IMPLANT agent whilst eliminating manual model specifications and at the same time still maintaining plausible speeds.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

AAAI Conferences

Sixth Artificial Intelligence and Interactive Digital Entertainment Conference

Country:

Asia > Singapore (0.05)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.70)

Industry:

Leisure & Entertainment > Games > Computer Games (0.96)
Leisure & Entertainment > Games > Chess (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Add feedback

Crowd Simulation Via Multi-Agent Reinforcement Learning

Torrey, Lisa (St. Lawrence University)

AAAI ConferencesOct-10-2010

Artificial intelligence is frequently used to control virtual characters in movies and games. When these characters appear in crowds, controlling them is called crowd simulation. In this paper, I suggest that crowd simulation could be accomplished by multi-agent reinforcement learning, a method by which groups of agents can learn to act autonomously in their environment. I present a case study that explores the challenges and benefits of this type of approach and encourages the development of learning techniques for AI in entertainment media.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

AAAI Conferences

Sixth Artificial Intelligence and Interactive Digital Entertainment Conference

Country:

North America > United States > Wisconsin (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Industry: Education (0.30)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning Companion Behaviors Using Reinforcement Learning in Games

Sharifi, AmirAli (University of Alberta) | Zhao, Richard (University of Alberta) | Szafron, Duane A. (University of Alberta)

AAAI ConferencesOct-10-2010

Our goal is to enable Non Player Characters (NPC) in computer games to exhibit natural behaviors. The quality of behaviors affects the game experience especially in story-based games, which rely on player-NPC interactions. We used Reinforcement Learning to enable NPC companions to develop preferences for actions. We implemented our RL technique in BioWare Corp.’s Neverwinter Nights. Our experiments evaluate an NPC companion’s behaviors regarding traps. Our method enables NPCs to rapidly learn reasonable behaviors and adapt to changes in the game.

machine learning, npc, reinforcement learning, (16 more...)

AAAI Conferences

Sixth Artificial Intelligence and Interactive Digital Entertainment Conference

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Information Technology (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Approximate Inference and Stochastic Optimal Control

Rawlik, Konrad, Toussaint, Marc, Vijayakumar, Sethu

arXiv.org Machine LearningSep-20-2010

We propose a novel reformulation of the stochastic optimal control problem as an approximate inference problem, demonstrating, that such a interpretation leads to new practical methods for the original problem. In particular we characterise a novel class of iterative solutions to the stochastic optimal control problem based on a natural relaxation of the exact dual formulation. These theoretical insights are applied to the Reinforcement Learning problem where they lead to new model free, off policy methods for discrete and continuous problems.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1009.3958

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback