Goto

Collaborating Authors

 Reinforcement Learning


Automatic Abstraction in Reinforcement Learning Using Ant System Algorithm

AAAI Conferences

Nowadays developing autonomous systems, which can act in various environments and interactively perform their assigned tasks, are intensively desirable. These systems would be ready to be applied in different fields such as medicine, controller robots and social life. Reinforcement learning is an attractive area of machine learning which addresses these concerns. In large scales, learning performance of an agent can be improved by using hierarchical Reinforcement Learning techniques and temporary extended actions. The higher level of abstraction helps the learning agent approach lifelong learning goals. In this paper a new method is presented for discovering subgoal states and constructing useful skills. The method utilizes Ant System optimization algorithm to identify bottleneck edges, which act like bridges between different connected areas of the problem space. Using discovered subgoals, the agent creates temporal abstractions, which enable it to explore more effectively. Experimental Results show that the proposed method can significantly improve the learning performance of the agent.


Autonomous Selection of Inter-Task Mappings in Transfer Learning (extended abstract)

AAAI Conferences

When transferring knowledge between reinforcement learning agents with different state representations or actions, past knowledge must be efficiently mapped so that it assists learning. The majority of the existing approaches use pre-defined mappings given by a domain expert. To overcome this limitations and allow autonomous transfer learning, this paper introduces a method for weighting and using multiple inter-task mappings, named COMBREL. Experimental results show that the use of multiple inter-task mappings, accompanied with a selection mechanism, can significantly boost the performance of transfer learning, relative to learning without transfer and relative to using a single hand-picked mapping.


Integrating Visual Learning and Hierarchical Planning for Autonomy in Human-Robot Collaboration

AAAI Conferences

Mobile robots deployed in real-world domains frequently find it difficult to process all sensor inputs, or to operate without human input and domain knowledge. At the same time, complex domains make it difficult to provide robots all relevant domain knowledge in advance, and humans are unlikely to have the time and expertise to provide elaborate and accurate feedback. This paper presents an integrated framework that creates novel opportunities for addressing these learning, adaptation and collaboration challenges associated with human-robot collaboration. The framework consists of hierarchical planning, bootstrap learning and online reinforcement learning algorithms that inform and guide each other. As a result, robots are able to make best use of sensor inputs, soliciting high-level feedback from non-expert humans when such feedback is necessary and available. All algorithms are evaluated in simulation and on wheeled robots in dynamic indoor domains.


Monte-Carlo utility estimates for Bayesian reinforcement learning

arXiv.org Machine Learning

Bayesian reinforcement learning [1], [2] is the decisiontheoretic approach [3] to solving the reinforcement learning problem. Unfonrtunately, calculating posterior distributions can be computationally expensive. Morever, the Bayesoptimal decision can be intractable [4], [5], [1], and even calculating an optimal solution in a restricted class can be difficult [6]. This paper proposes a set of algorithms that take actions by estimating bounds on the Bayes-optimal utility through sampling. They include a direct Monte-Carlo approach, as well as gradient-based approaches. We demonstrate the effectiveness of the proposed algorithms experimentally. A. Setting In the reinforcement learning problem, an agent is acting in some unknown Markovian environment ยต M, according to some policy ฯ€ ฮ . The agent's policy is a procedure for selecting actions, with the action at time t being a


Deliberation Scheduling for Time-Critical Sequential Decision Making

arXiv.org Artificial Intelligence

We describe a method for time-critical decision making involving sequential tasks and stochastic processes. The method employs several iterative refinement routines for solving different aspects of the decision making problem. This paper concentrates on the meta-level control problem of deliberation scheduling, allocating computational resources to these routines. We provide different models corresponding to optimization problems that capture the different circumstances and computational strategies for decision making under time constraints. We consider precursor models in which all decision making is performed prior to execution and recurrent models in which decision making is performed in parallel with execution, accounting for the states observed during execution and anticipating future states. We describe algorithms for precursor and recurrent models and provide the results of our empirical investigations to date.


Multi-class Generalized Binary Search for Active Inverse Reinforcement Learning

arXiv.org Artificial Intelligence

This paper addresses the problem of learning a task from demonstration. We adopt the framework of inverse reinforcement learning, where tasks are represented in the form of a reward function. Our contribution is a novel active learning algorithm that enables the learning agent to query the expert for more informative demonstrations, thus leading to more sample-efficient learning. For this novel algorithm (Generalized Binary Search for Inverse Reinforcement Learning, or GBS-IRL), we provide a theoretical bound on sample complexity and illustrate its applicability on several different tasks. To our knowledge, GBS-IRL is the first active IRL algorithm with provable sample complexity bounds. We also discuss our method in light of other existing methods in the literature and its general applicability in multi-class classification problems. Finally, motivated by recent work on learning from demonstration in robots, we also discuss how different forms of human feedback can be integrated in a transparent manner in our learning framework.


Efficient Sample Reuse in Policy Gradients with Parameter-based Exploration

arXiv.org Machine Learning

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge in this scenario is how to reduce the variance of policy gradient estimates for reliable policy updates. In this paper, we combine the following three ideas and give a highly effective policy gradient method: (a) the policy gradients with parameter based exploration, which is a recently proposed policy search method with low variance of gradient estimates, (b) an importance sampling technique, which allows us to reuse previously gathered data in a consistent way, and (c) an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.


Planning by Prioritized Sweeping with Small Backups

arXiv.org Artificial Intelligence

Efficient planning plays a crucial role in model-based reinforcement learning. Traditionally, the main planning operation is a full backup based on the current estimates of the successor states. Consequently, its computation time is proportional to the number of successor states. In this paper, we introduce a new planning backup that uses only the current value of a single successor state and has a computation time independent of the number of successor states. This new backup, which we call a small backup, opens the door to a new class of model-based reinforcement learning methods that exhibit much finer control over their planning process than traditional methods. We empirically demonstrate that this increased flexibility allows for more efficient planning by showing that an implementation of prioritized sweeping based on small backups achieves a substantial performance improvement over classical implementations.


The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

arXiv.org Artificial Intelligence

There exist a number of reinforcement learning algorithms which learnby climbing the gradient of expected reward. Their long-runconvergence has been proved, even in partially observableenvironments with non-deterministic actions, and without the need fora system model. However, the variance of the gradient estimator hasbeen found to be a significant practical problem. Recent approacheshave discounted future rewards, introducing a bias-variance trade-offinto the gradient estimate. We incorporate a reward baseline into thelearning system, and show that it affects variance without introducingfurther bias. In particular, as we approach the zero-bias,high-variance parameterization, the optimal (or variance minimizing)constant reward baseline is equal to the long-term average expectedreward. Modified policy-gradient algorithms are presented, and anumber of experiments demonstrate their improvement over previous work.


Policy Evaluation with Variance Related Risk Criteria in Markov Decision Processes

arXiv.org Machine Learning

In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose both TD(0) and LSTD(lambda) variants with linear function approximation, prove their convergence, and demonstrate their utility in a 4-dimensional continuous state space problem.