AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Reinforcement and Imitation Learning via Interactive No-Regret Learning

Ross, Stephane, Bagnell, J. Andrew

arXiv.org Machine LearningJun-23-2014

Recent work has demonstrated that problems-- particularly imitation learning and structured prediction-- where a learner's predictions influence the input-distribution it is tested on can be naturally addressed by an interactive approach and analyzed using no-regret online learning. These approaches to imitation learning, however, neither require nor benefit from information about the cost of actions. We extend existing results in two directions: first, we develop an interactive imitation learning approach that leverages cost information; second, we extend the technique to address reinforcement learning. The results provide theoretical support to the commonly observed successes of online approximate policy iteration. Our approach suggests a broad new family of algorithms and provides a unifying view of existing techniques for imitation and reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1406.5979

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Education (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

The Route Not Taken: Driver-Centric Estimation of Electric Vehicle Range

Ondruska, Peter (University of Oxford) | Posner, Ingmar (University of Oxford)

AAAI ConferencesJun-9-2014

This paper addresses the challenge of efficiently and accurately predicting an electric vehicle's attainable range. Specifically, our approach accounts for a driver's generalised route preferences to provide up-to-date, personalised information based on estimates of the energy required to reach every possible destination in a map. We frame this task in the context of sequential decision making and show that energy consumption in reaching a particular destination can be formulated as policy evaluation in a Markov Decision Process. In particular, we exploit the properties of the model adopted for predicting likely energy consumption to every possible destination in a realistically sized map in real-time. The policy to be evaluated is learned and, over time, refined using Inverse Reinforcement Learning to provide for a life-long adaptive system. Our approach is evaluated using a publicly available dataset providing real trajectory data of 50 individuals spanning approximately 10,000 miles of travel. We show that by accounting for driver specific route preferences our system significantly reduces the relative error in energy prediction compared to more common, driver-agnostic heuristics such as shortest-path or shortest-time routes.

driver-centric estimation, electric vehicle range

AAAI Conferences

Twenty-Fourth International Conference on Automated Planning and Scheduling

Industry:

Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

Reinforcement Learning for Weakly-Coupled MDPs and an Application to Planetary Rover Control

Bernstein, Daniel S. (University of Massachusetts, Amherst) | Zilberstein, Shlomo (University of Massachusetts, Amherst)

AAAI ConferencesJun-9-2014

Weakly-coupled Markov decision processes can be decomposed into subprocesses that interact only through a small set of bottleneck states. We study a hierarchical reinforcement learning algorithm designed to take advantage of this particular type of decomposability. To test our algorithm, we use a decision-making problem faced by autonomous planetary rovers. In this problem, a Mars rover must decide which activities to perform and when to traverse between science sites in order to make the best use of its limited resources. In our experiments, the hierarchical algorithm performs better than Q-learning in the early stages of learning, but unlike Q-learning it converges to a suboptimal policy. This suggests that it may be advantageous to use the hierarchical algorithm when training time is limited.

artificial intelligence, planetary rover control, reinforcement learning, (3 more...)

AAAI Conferences

Sixth European Conference on Planning

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Efficient Model Learning for Human-Robot Collaborative Tasks

Nikolaidis, Stefanos, Gu, Keren, Ramakrishnan, Ramya, Shah, Julie

arXiv.org Artificial IntelligenceMay-24-2014

We present a framework for learning human user models from joint-action demonstrations that enables the robot to compute a robust policy for a collaborative task with a human. The learning takes place completely automatically, without any human intervention. First, we describe the clustering of demonstrated action sequences into different human types using an unsupervised learning algorithm. These demonstrated sequences are also used by the robot to learn a reward function that is representative for each type, through the employment of an inverse reinforcement learning algorithm. The learned model is then used as part of a Mixed Observability Markov Decision Process formulation, wherein the human type is a partially observable variable. With this framework, we can infer, either offline or online, the human type of a new user that was not included in the training set, and can compute a policy for the robot that will be aligned to the preference of this new user and will be robust to deviations of the human actions from prior demonstrations. Finally we validate the approach using data collected in human subject experiments, and conduct proof-of-concept demonstrations in which a person performs a collaborative task with a small industrial robot.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/2696454.2696455

1405.6341

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre:

Workflow (0.48)
Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Structural Return Maximization for Reinforcement Learning

Joseph, Joshua, Velez, Javier, Roy, Nicholas

arXiv.org Machine LearningMay-11-2014

Reinforcement Learning (RL) (Sutton & Barto, 1998) is a framework for sequential decision making under uncertainty with the objective of finding a policy that maximizes the sum of rewards, or return, of an agent. A straightforward model-based approach to batch RL, where the algorithm learns a policy from a fixed set of data, is to fit a dynamics model by minimizing a form of prediction error (e.g., minimum squared error) and then compute the optimal policy with respect to the learned model (Bertsekas, 2000). As discussed in Baxter & Bartlett (2001) and Joseph et al. (2013), learning a model for RL by minimizing prediction error can result in a policy that performs arbitrarily poorly for unfavorably chosen model classes. To overcome this limitation, a second approach is to not use a model and directly learn the policy from a policy class that explicitly maximizes an estimate of return (Meuleau et al., 2000). With limited data, approaches that explicitly maximize estimated return are vulnerable to learning policies which perform poorly since the return cannot be confidently estimated.

machine learning, policy class, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1405.2606

Country: North America > United States (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Off-policy reinforcement learning for $ H_\infty $ control design

Luo, Biao, Wu, Huai-Ning, Huang, Tingwen

arXiv.org Machine LearningMay-11-2014

The $H_\infty$ control design problem is considered for nonlinear systems with unknown internal system model. It is known that the nonlinear $ H_\infty $ control problem can be transformed into solving the so-called Hamilton-Jacobi-Isaacs (HJI) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, model-based approaches cannot be used for approximately solving HJI equation, when the accurate system model is unavailable or costly to obtain in practice. To overcome these difficulties, an off-policy reinforcement leaning (RL) method is introduced to learn the solution of HJI equation from real system data instead of mathematical system model, and its convergence is proved. In the off-policy RL method, the system data can be generated with arbitrary policies rather than the evaluating policy, which is extremely important and promising for practical systems. For implementation purpose, a neural network (NN) based actor-critic structure is employed and a least-square NN weight update algorithm is derived based on the method of weighted residuals. Finally, the developed NN-based off-policy RL method is tested on a linear F16 aircraft plant, and further applied to a rotational/translational actuator system.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TCYB.2014.2319577

1311.6107

Country:

Asia > China (1.00)
North America > United States > Texas (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)

Add feedback

Cover Tree Bayesian Reinforcement Learning

Tziortziotis, Nikolaos, Dimitrakakis, Christos, Blekas, Konstantinos

arXiv.org Machine LearningMay-2-2014

This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with a Gaussian process model, a linear model and simple least squares policy iteration.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1305.1809

Country:

Europe (0.93)
North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Convergence of a Q-learning Variant for Continuous States and Actions

Carden, S. W.

Journal of Artificial Intelligence ResearchApr-29-2014

This paper presents a reinforcement learning algorithm for solving infinite horizon Markov Decision Processes under the expected total discounted reward criterion when both the state and action spaces are continuous. This algorithm is based on Watkins' Q-learning, but uses Nadaraya-Watson kernel smoothing to generalize knowledge to unvisited states. As expected, continuity conditions must be imposed on the mean rewards and transition probabilities. Using results from kernel regression theory, this algorithm is proven capable of producing a Q-value function estimate that is uniformly within an arbitrary tolerance of the true Q-value function with probability one. The algorithm is then applied to an example problem to empirically show convergence as well.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4271

AI Access Foundation

10876

Journal of Artificial Intelligence Research

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.04)
North America > United States > Ohio (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

A General Framework for Interacting Bayes-Optimally with Self-Interested Agents using Arbitrary Parametric Model and Model Prior

Hoang, Trong Nghia, Low, Kian Hsiang

arXiv.org Machine LearningMar-16-2014

Recent advances in Bayesian reinforcement learning (BRL) have shown that Bayes-optimality is theoretically achievable by modeling the environment's latent dynamics using Flat-Dirichlet-Multinomial (FDM) prior. In self-interested multi-agent environments, the transition dynamics are mainly controlled by the other agent's stochastic behavior for which FDM's independence and modeling assumptions do not hold. As a result, FDM does not allow the other agent's behavior to be generalized across different states nor specified using prior domain knowledge. To overcome these practical limitations of FDM, we propose a generalization of BRL to integrate the general class of parametric models and model priors, thus allowing practitioners' domain knowledge to be exploited to produce a fine-grained and compact representation of the other agent's behavior. Empirical evaluation shows that our approach outperforms existing multi-agent reinforcement learning algorithms.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1304.2024

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

A Supervised Goal Directed Algorithm in Economical Choice Behaviour: An Actor-Critic Approach

Yahya, Keyvan

arXiv.org Artificial IntelligenceFeb-22-2014

This paper aims to find an algorithmic structure that affords to predict and explain economical choice behaviour particularly under uncertainty(random policies) by manipulating the prevalent Actor-Critic learning method to comply with the requirements we have been entrusted ever since the field of neuroeconomics dawned on us. Whilst skimming some basics of neuroeconomics that seem relevant to our discussion, we will try to outline some of the important works which have so far been done to simulate choice making processes. Concerning neurological findings that suggest the existence of two specific functions that are executed through Basal Ganglia all the way up to sub- cortical areas, namely 'rewards' and 'beliefs', we will offer a modified version of actor/critic algorithm to shed a light on the relation between these functions and most importantly resolve what is referred to as a challenge for actor-critic algorithms, that is, the lack of inheritance or hierarchy which avoids the system being evolved in continuous time tasks whence the convergence might not be emerged.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1401.3579

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.97)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.54)

Add feedback