AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Automatic Abstraction in Reinforcement Learning Using Ant System Algorithm

Ghafoorian, Mohsen (Sharif University of Technology) | Taghizadeh, Nasrin (Sharif University of Technology) | Beigy, Hamid (Sharif University of Technology)

AAAI ConferencesMar-21-2013

Nowadays developing autonomous systems, which can act in various environments and interactively perform their assigned tasks, are intensively desirable. These systems would be ready to be applied in different fields such as medicine, controller robots and social life. Reinforcement learning is an attractive area of machine learning which addresses these concerns. In large scales, learning performance of an agent can be improved by using hierarchical Reinforcement Learning techniques and temporary extended actions. The higher level of abstraction helps the learning agent approach lifelong learning goals. In this paper a new method is presented for discovering subgoal states and constructing useful skills. The method utilizes Ant System optimization algorithm to identify bottleneck edges, which act like bridges between different connected areas of the problem space. Using discovered subgoals, the agent creates temporal abstractions, which enable it to explore more effectively. Experimental Results show that the proposed method can significantly improve the learning performance of the agent.

ant system algorithm, automatic abstraction, reinforcement learning

AAAI Conferences

2013 AAAI Spring Symposium Series

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)

Add feedback

Autonomous Selection of Inter-Task Mappings in Transfer Learning (extended abstract)

Fachantidis, Anestis (Aristotle University of Thessaloniki) | Partalas, Ioannis (Universite' Joseph Fourier ) | Taylor, Matthew E. (Washington State University) | Vlahavas, Ioannis (Aristotle University of Thessaloniki)

AAAI ConferencesMar-21-2013

When transferring knowledge between reinforcement learning agents with different state representations or actions, past knowledge must be efficiently mapped so that it assists learning. The majority of the existing approaches use pre-defined mappings given by a domain expert. To overcome this limitations and allow autonomous transfer learning, this paper introduces a method for weighting and using multiple inter-task mappings, named COMBREL. Experimental results show that the use of multiple inter-task mappings, accompanied with a selection mechanism, can significantly boost the performance of transfer learning, relative to learning without transfer and relative to using a single hand-picked mapping.

artificial intelligence, machine learning, reinforcement learning, (3 more...)

AAAI Conferences

2013 AAAI Spring Symposium Series

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

Integrating Visual Learning and Hierarchical Planning for Autonomy in Human-Robot Collaboration

Sridharan, Mohan (Texas Tech University)

AAAI ConferencesMar-21-2013

Mobile robots deployed in real-world domains frequently find it difficult to process all sensor inputs, or to operate without human input and domain knowledge. At the same time, complex domains make it difficult to provide robots all relevant domain knowledge in advance, and humans are unlikely to have the time and expertise to provide elaborate and accurate feedback. This paper presents an integrated framework that creates novel opportunities for addressing these learning, adaptation and collaboration challenges associated with human-robot collaboration. The framework consists of hierarchical planning, bootstrap learning and online reinforcement learning algorithms that inform and guide each other. As a result, robots are able to make best use of sensor inputs, soliciting high-level feedback from non-expert humans when such feedback is necessary and available. All algorithms are evaluated in simulation and on wheeled robots in dynamic indoor domains.

artificial intelligence, machine learning, reinforcement learning, (4 more...)

AAAI Conferences

2013 AAAI Spring Symposium Series

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.60)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

Monte-Carlo utility estimates for Bayesian reinforcement learning

Dimitrakakis, Christos

arXiv.org Machine LearningMar-11-2013

Bayesian reinforcement learning [1], [2] is the decisiontheoretic approach [3] to solving the reinforcement learning problem. Unfonrtunately, calculating posterior distributions can be computationally expensive. Morever, the Bayesoptimal decision can be intractable [4], [5], [1], and even calculating an optimal solution in a restricted class can be difficult [6]. This paper proposes a set of algorithms that take actions by estimating bounds on the Bayes-optimal utility through sampling. They include a direct Monte-Carlo approach, as well as gradient-based approaches. We demonstrate the effectiveness of the proposed algorithms experimentally. A. Setting In the reinforcement learning problem, an agent is acting in some unknown Markovian environment µ M, according to some policy π Π. The agent's policy is a procedure for selecting actions, with the action at time t being a

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

doi: 10.1109/CDC.2013.6761048

1303.2506

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Industry: Education > Focused Education > Special Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Deliberation Scheduling for Time-Critical Sequential Decision Making

Dean, Thomas L., Kaelbling, Leslie Pack, Kirman, Jak, Nicholson, Ann

arXiv.org Artificial IntelligenceMar-6-2013

We describe a method for time-critical decision making involving sequential tasks and stochastic processes. The method employs several iterative refinement routines for solving different aspects of the decision making problem. This paper concentrates on the meta-level control problem of deliberation scheduling, allocating computational resources to these routines. We provide different models corresponding to optimization problems that capture the different circumstances and computational strategies for decision making under time constraints. We consider precursor models in which all decision making is performed prior to execution and recurrent models in which decision making is performed in parallel with execution, accounting for the states observed during execution and anticipating future states. We describe algorithms for precursor and recurrent models and provide the results of our empirical investigations to date.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1303.1491

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Industry: Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.41)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.35)

Add feedback

Multi-class Generalized Binary Search for Active Inverse Reinforcement Learning

Melo, Francisco, Lopes, Manuel

arXiv.org Artificial IntelligenceJan-23-2013

This paper addresses the problem of learning a task from demonstration. We adopt the framework of inverse reinforcement learning, where tasks are represented in the form of a reward function. Our contribution is a novel active learning algorithm that enables the learning agent to query the expert for more informative demonstrations, thus leading to more sample-efficient learning. For this novel algorithm (Generalized Binary Search for Inverse Reinforcement Learning, or GBS-IRL), we provide a theoretical bound on sample complexity and illustrate its applicability on several different tasks. To our knowledge, GBS-IRL is the first active IRL algorithm with provable sample complexity bounds. We also discuss our method in light of other existing methods in the literature and its general applicability in multi-class classification problems. Finally, motivated by recent work on learning from demonstration in robots, we also discuss how different forms of human feedback can be integrated in a transparent manner in our learning framework.

gb-irl, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1301.5488

Genre: Research Report (0.82)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

Zhao, Tingting, Hachiya, Hirotaka, Tangkaratt, Voot, Morimoto, Jun, Sugiyama, Masashi

arXiv.org Machine LearningJan-16-2013

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge in this scenario is how to reduce the variance of policy gradient estimates for reliable policy updates. In this paper, we combine the following three ideas and give a highly effective policy gradient method: (a) the policy gradients with parameter based exploration, which is a recently proposed policy search method with low variance of gradient estimates, (b) an importance sampling technique, which allows us to reuse previously gathered data in a consistent way, and (c) an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1301.3966

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Planning by Prioritized Sweeping with Small Backups

van Seijen, Harm, Sutton, Richard S.

arXiv.org Artificial IntelligenceJan-10-2013

Efficient planning plays a crucial role in model-based reinforcement learning. Traditionally, the main planning operation is a full backup based on the current estimates of the successor states. Consequently, its computation time is proportional to the number of successor states. In this paper, we introduce a new planning backup that uses only the current value of a single successor state and has a computation time independent of the number of successor states. This new backup, which we call a small backup, opens the door to a new class of model-based reinforcement learning methods that exhibit much finer control over their planning process than traditional methods. We empirically demonstrate that this increased flexibility allows for more efficient planning by showing that an implementation of prioritized sweeping based on small backups achieves a substantial performance improvement over classical implementations.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1301.2343

Country: North America > Canada > Alberta (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

Weaver, Lex, Tao, Nigel

arXiv.org Artificial IntelligenceJan-10-2013

There exist a number of reinforcement learning algorithms which learnby climbing the gradient of expected reward. Their long-runconvergence has been proved, even in partially observableenvironments with non-deterministic actions, and without the need fora system model. However, the variance of the gradient estimator hasbeen found to be a significant practical problem. Recent approacheshave discounted future rewards, introducing a bias-variance trade-offinto the gradient estimate. We incorporate a reward baseline into thelearning system, and show that it affects variance without introducingfurther bias. In particular, as we approach the zero-bias,high-variance parameterization, the optimal (or variance minimizing)constant reward baseline is equal to the long-term average expectedreward. Modified policy-gradient algorithms are presented, and anumber of experiments demonstrate their improvement over previous work.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

1301.2315

Country: North America > United States > Massachusetts (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.96)

Add feedback

Policy Evaluation with Variance Related Risk Criteria in Markov Decision Processes

Tamar, Aviv, Di Castro, Dotan, Mannor, Shie

arXiv.org Machine LearningJan-1-2013

In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose both TD(0) and LSTD(lambda) variants with linear function approximation, prove their convergence, and demonstrate their utility in a 4-dimensional continuous state space problem.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1301.0104

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.51)

Add feedback