AITopics

Knowledge transfer has been suggested as a useful approach for solving large Markov Decision Processes. The main idea is to compute a decision-making policy in one environment and use it in a different environment, provided the two are ”close enough”. In this paper, we use bisimulation-style metrics (Ferns et al., 2004) to guide knowledge transfer. We propose algorithms that decide what actions to transfer from the policy computed on a small MDP task to a large task, given the bisimulation distance between states in the two tasks. We demonstrate the inherent ”pessimism” of bisimulation metrics and present variants of this metric aimed to overcome this pessimism, leading to improved action transfer. We also show that using this approach for transferring temporally extended actions (Sutton et al., 1999) is more successful than using it exclusively with primitive actions. We present theoretical guarantees on the quality of the transferred policy, as well as promising empirical results.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: North America > Canada > Quebec > Montreal (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Integrating Sample-Based Planning and Model-Based Reinforcement Learning

Walsh, Thomas J. (Rutgers University) | Goschin, Sergiu (Rutgers University) | Littman, Michael L. (Rutgers University)

Recent advancements in model-based reinforcement learning have shown that the dynamics of many structured domains (e.g. DBNs) can be learned with tractable sample complexity, despite their exponentially large state spaces. Unfortunately, these algorithms all require access to a planner that computes a near optimal policy, and while many traditional MDP algorithms make this guarantee, their computation time grows with the number of states. We show how to replace these over-matched planners with a class of sample-based planners — whose computation time is independent of the number of states — without sacrificing the sample-efficiency guarantees of the overall learning algorithms. To do so, we define sufficient criteria for a sample-based planner to be used in such a learning system and analyze two popular sample-based approaches from the literature. We also introduce our own sample-based planner, which combines the strategies from these algorithms and still meets the criteria for integration into our learning system. In doing so, we define the first complete RL solution for compactly represented (exponentially sized) state spaces with efficiently learnable dynamics that is both sample efficient and whose computation time does not grow rapidly with the number of states.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.94)

Veness, Joel (University of New South Wales and NICTA) | Ng, Kee Siong (Medicare Australia and Australian National University) | Hutter, Marcus (Australian National University and NICTA) | Silver, David (University College London)

Reinforcement Learning via AIXI Approximation

This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agent-specific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: Oceania > Australia (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Reinforcement Learning Via Practice and Critique Advice

Judah, Kshitij (Oregon State University) | Roy, Saikat (Oregon State University) | Fern, Alan (Oregon State University) | Dietterich, Thomas G. (Oregon State University)

We consider the problem of incorporating end-user advice into reinforcement learning (RL). In our setting, the learner alternates between practicing, where learning is based on actual world experience, and end-user critique sessions where advice is gathered. During each critique session the end-user is allowed to analyze a trajectory of the current policy and then label an arbitrary subset of the available actions as good or bad. Our main contribution is an approach for integrating all of the information gathered during practice and critiques in order to effectively optimize a parametric policy. The approach optimizes a loss function that linearly combines losses measured against the world experience and the critique data. We evaluate our approach using a prototype system for teaching tactical battle behavior in a real-time strategy game engine. Results are given for a significant evaluation involving ten end-users showing the promise of this approach and also highlighting challenges involved in inserting end-users into the RL loop.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: North America > United States > Oregon > Benton County > Corvallis (0.04)

Genre: Research Report > New Finding (0.94)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Erez, Tom (Washington University in St. Louis)

Local Optimization for Simulation of Natural Motion

AAAI ConferencesJul-12-2010

I intend to use RL to bring the two together, The Reinforcement Learning (RL) agent interacts with a dynamical and generate motion from the proposed first principles system whose states capture all the relevant information in realistic biomechanical models, and compare the about the current configuration of the agent and its results to the behavior of living creatures. This is a nontrivial environment. By specifying a sequence of actions, the agent problem: biomechanical models are continuous, highdimensional alters the state transitions of this dynamical system. The optimality and nonlinear, and the optimality criteria considered criterion is formalized by a reward function defined in the literature are non-quadratic. In order to address over state-action pairs, and the agent's goal is to maximize these profound challenges, I propose three basic principles the cumulative reward.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Fifteenth AAAI/SIGART Doctoral Consortium

Country:

Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Industry: Health & Medicine (0.32)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)

Doshi-Velez, Finale (Massachusetts Institute of Technology)

Nonparametric Bayesian Approaches for Reinforcement Learning in Partially Observable Domains

AAAI ConferencesJul-12-2010

The objective of my doctoral research is bring together two fields: partially-observable reinforcement learning (PORL) and non-parametric Bayesian statistics (NPB) to address issues of statistical modeling and decision-making in complex, real-world domains.

machine learning, reinforcement, reinforcement learning, (15 more...)

Fifteenth AAAI/SIGART Doctoral Consortium

Country:

North America > United States > Massachusetts (0.05)
Asia > Middle East > Jordan (0.05)

Industry: Health & Medicine (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Loscalzo, Steven (Air Force Research Laboratory Information Directorate) | Wright, Robert (Air Force Research Laboratory Information Directorate)

Automatic Methods for Continuous State Space Abstraction

AAAI ConferencesJul-8-2010

Reinforcement learning algorithms are often tasked with learning an optimal control policy in a continuous state space. Since it is infeasible to learn the optimal action to take for every possible observation in a continuous state space, use- ful abstractions of the space must be constructed and subse- quently learned on. Abstraction techniques that generalize the space into very few abstract states must take care to avoid creating an abstraction that prevents learning the optimal policy. Many commonly used abstractions, such as CMAC can take considerable effort to tune to ensure a learnable abstraction is created. In this work we propose three methods that derive state abstractions automatically, in part by making use of the dimensionality reduction capability of the RL-SANE algorithm. We show that abstractions derived from these automatic methods can allow a learning algorithm to converge to the optimal policy faster than with a fixed abstraction. Ad- ditionally, these techniques are able to break the space into very few abstract states, further facilitating rapid learning.

abstract state, abstraction, algorithm, (15 more...)

Workshops at the Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Lin, Stephen (Air Force Research Laboratory ‚ Information Directorate) | Wright, Robert (Air Force Research Laboratory ‚ Information Directorate)

Evolutionary Tile Coding: An Automated State Abstraction Algorithm for Reinforcement Learning

AAAI ConferencesJul-8-2010

Reinforcement learning (RL) algorithms have the ability to learn optimal policies for control problems by exploring a domain's state space. Unfortunately, for most problems the size of the state space is too great for RL technologies to fully explore in order to find good policies. State abstraction is one way of reducing the size and complexity of a domain's state space in order to enable RL. In this paper we introduce a new approach for automatically deriving state abstractions called Evolutionary Tile Coding that uses a genetic algorithm for deriving effective tile codings. We provide an empirical analysis of the new algorithm comparing it to another adaptive tile coding method as well as fixed tile coding. Our results show that our approach is able to automatically derive effective state abstractions for two RL benchmark problems. Additionally, we present an intriguing result that shows the classical mountain car problem's state space can be reduced to just two states and still preserve the discovery of an optimal policy.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Workshops at the Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report > New Finding (0.86)

Industry:

Transportation > Passenger (0.35)
Transportation > Ground > Road (0.35)
Automobiles & Trucks (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)