AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Exploring compact reinforcement-learning representations with linear regression

Walsh, Thomas J., Szita, Istvan, Diuk, Carlos, Littman, Michael L.

arXiv.org Artificial IntelligenceMay-9-2012

This paper presents a new algorithm for online linear regression whose efficiency guarantees satisfy the requirements of the KWIK (Knows What It Knows) framework. The algorithm improves on the complexity bounds of the current state-of-the-art procedure in this setting. We explore several applications of this algorithm for learning compact reinforcement-learning representations. We show that KWIK linear regression can be used to learn the reward function of a factored MDP and the probabilities of action outcomes in Stochastic STRIPS and Object Oriented MDPs, none of which have been proven to be efficiently learnable in the RL setting before. We also combine KWIK linear regression with other KWIK learners to learn larger portions of these models, including experiments on learning factored MDP transition and reward functions together.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

1205.2606

Country:

North America > United States (0.46)
North America > Canada > Alberta (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Temporal-Difference Networks for Dynamical Systems with Continuous Observations and Actions

Vigorito, Christopher M.

arXiv.org Machine LearningMay-9-2012

Temporal-difference (TD) networks are a class of predictive state representations that use well-established TD methods to learn models of partially observable dynamical systems. Previous research with TD networks has dealt only with dynamical systems with finite sets of observations and actions. We present an algorithm for learning TD network representations of dynamical systems with continuous observations and actions. Our results show that the algorithm is capable of learning accurate and robust models of several noisy continuous dynamical systems. The algorithm presented here is the first fully incremental method for learning a predictive representation of a continuous dynamical system.

machine learning, prediction, reinforcement learning, (21 more...)

arXiv.org Machine Learning

1205.2608

Country: North America > United States > Massachusetts (0.46)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Scientific Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

TEXPLORE: Real-Time Sample-Efficient Reinforcement Learning for Robots

Hester, Todd (The University of Texas at Austin) | Stone, Peter (The University of Texas at Austin)

AAAI ConferencesMar-25-2012

Reinforcement Learning (RL) is a paradigm for learning decision-making tasks that could enable robots to learn and adapt to situations on-line. For an RL algorithm to be practical for robotic control tasks, it must learn in very few samples, while continually taking actions in real-time. In addition, the algorithm must learn efficiently in the face of noise, sensor/actuator delays and continuous state features. In this paper, we describe TEXPLORE, a model-based RL method that addresses these issues. It learns a random forest model of the domain which generalizes dynamics to unseen states. The agent targets exploration on states that are both promising for the final policy and uncertain in the model. With sample-based planning and a novel parallel architecture, TEXPLORE can select actions continually in real-time whenever necessary. We empirically evaluate TEXPLORE learning to control the velocity of an autonomous vehicle in real-time.

agent, algorithm, texplore, (14 more...)

AAAI Conferences

2012 AAAI Spring Symposium Series

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Scaling-up Knowledge for a Cognizant Robot

Degris, Thomas (INRIA) | Modayil, Joseph (University of Alberta)

AAAI ConferencesMar-25-2012

This paper takes a new approach to the old adage that knowledge is the key for artificial intelligence. A cognizant robot is a robot with a deep and immediately accessible understanding of its interaction with the environment — an understanding the robot can use to flexibly adapt to novel situations. Such a robot will need a vast amount of situated, revisable, and expressive knowledge to display flexible intelligent behaviors. Instead of relying on human-provided knowledge, we propose that an arbitrary robot can autonomously acquire pertinent knowledge directly from everyday interaction with the environment. We show how existing ideas in reinforcement learning can enable a robot to maintain and improve its knowledge. The robot performs a continual learning process that scales-up knowledge acquisition to cover a large number of facts, skills and predictions. This knowledge has semantics that are grounded in sensorimotor experience. We see the approach of developing more cognizant robots as a necessary key step towards broadly competent robots.

knowledge, prediction, robot, (14 more...)

AAAI Conferences

2012 AAAI Spring Symposium Series

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

BECCA: Reintegrating AI for Natural World Interaction

Rohrer, Brandon (Sandia National Laboratories)

AAAI ConferencesMar-25-2012

Natural world interaction (NWI), the pursuit of arbitrary goals in unstructured physical environments, is an excellent motivating problem for the reintegration of artificial intelligence. It is the problem set that humans struggle to solve. At a minimum it entails perception, learning, planning, and control, and can also involve language and social behavior. An agent's fitness in NWI is achieved by being able to perform a wide variety of tasks, rather than being able to excel at one. In an attempt to address NWI, a brain-emulating cognition and control architecture (BECCA) was developed. It uses a combination of feature creation and model-based reinforcement learning to capture structure in the environment in order to maximize reward. BECCA avoids making common assumptions about its world, such as stationarity, determinism, and the Markov assumption. BECCA has been demonstrated performing a set of tasks which is non-trivially broad, including a vision-based robotics task. Current development activity is focused on applying BECCA to the problem of general Search and Retrieve, a representative natural world interaction task.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

AAAI Conferences

2012 AAAI Spring Symposium Series

Country:

North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California (0.04)

Industry:

Government > Regional Government (0.46)
Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

Parametric Return Density Estimation for Reinforcement Learning

Morimura, Tetsuro, Sugiyama, Masashi, Kashima, Hisashi, Hachiya, Hirotaka, Tanaka, Toshiyuki

arXiv.org Machine LearningMar-15-2012

Most conventional Reinforcement Learning (RL) algorithms aim to optimize decision-making rules in terms of the expected returns. However, especially for risk management purposes, other risk-sensitive criteria such as the value-at-risk or the expected shortfall are sometimes preferred in real applications. Here, we describe a parametric method for estimating density of the returns, which allows us to handle various criteria in a unified manner. We first extend the Bellman equation for the conditional expected return to cover a conditional probability density of the returns. Then we derive an extension of the TD-learning algorithm for estimating the return densities in an unknown environment. As test instances, several parametric density estimation algorithms are presented for the Gaussian, Laplace, and skewed Laplace distributions. We show that these algorithms lead to risk-sensitive as well as robust RL paradigms through numerical experiments.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1203.3497

Country: Asia > Japan > Honshū (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

Real-Time Scheduling via Reinforcement Learning

Glaubius, Robert, Tidwell, Terry, Gill, Christopher, Smart, William D.

arXiv.org Machine LearningMar-15-2012

Cyber-physical systems, such as mobile robots, must respond adaptively to dynamic operating conditions. Effective operation of these systems requires that sensing and actuation tasks are performed in a timely manner. Additionally, execution of mission specific tasks such as imaging a room must be balanced against the need to perform more general tasks such as obstacle avoidance. This problem has been addressed by maintaining relative utilization of shared resources among tasks near a user-specified target level. Producing optimal scheduling strategies requires complete prior knowledge of task behavior, which is unlikely to be available in practice. Instead, suitable scheduling strategies must be learned online through interaction with the system. We consider the sample complexity of reinforcement learning in this domain, and demonstrate that while the problem state space is countably infinite, we may leverage the problem's structure to guarantee efficient learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1203.3481

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

Add feedback

Fast Reinforcement Learning with Large Action Sets using Error-Correcting Output Codes for MDP Factorization

Dulac-Arnold, Gabriel, Denoyer, Ludovic, Preux, Philippe, Gallinari, Patrick

arXiv.org Machine LearningFeb-29-2012

The use of Reinforcement Learning in real-world scenarios is strongly limited by issues of scale. Most RL learning algorithms are unable to deal with problems composed of hundreds or sometimes even dozens of possible actions, and therefore cannot be applied to many real-world problems. We consider the RL problem in the supervised classification framework where the optimal policy is obtained through a multiclass classifier, the set of classes being the set of actions of the problem. We introduce error-correcting output codes (ECOCs) in this setting and propose two new methods for reducing complexity when using rollouts-based approaches. The first method consists in using an ECOC-based classifier as the multiclass classifier, reducing the learning complexity from O(A2) to O(Alog(A)). We then propose a novel method that profits from the ECOC's coding dictionary to split the initial MDP into O(log(A)) seperate two-action MDPs. This second method reduces learning complexity even further, from O(A2) to O(log(A)), thus rendering problems with large action sets tractable. We finish by experimentally demonstrating the advantages of our approach on a set of benchmark problems, both in speed and performance.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1203.0203

Country: Europe (0.28)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Automatic Versus Human Navigation in Information Networks

West, Robert (Stanford University) | Leskovec, Jure (Stanford University)

AAAI ConferencesFeb-22-2012

People regularly face tasks that can be understood as navigation in information networks, where the goal is to find a path between two given nodes. In many such situations, the navigator only gets local access to the node currently under inspection and its immediate neighbors. This lack of global information about the network notwithstanding, humans tend to be good at finding short paths, despite the fact that real-world networks are typically very large. One potential reason for this could be that humans possess vast amounts of background knowledge about the world, which they leverage to make good guesses about possible solutions. In this paper we ask the question: Are human-like high-level reasoning skills really necessary for finding short paths? To answer this question, we design a number of navigation agents without such skills, which use only simple numerical features. We evaluate the agents on the task of navigating Wikipedia, a domain for which we also possess large-scale human navigation data. We observe that the agents find shorter paths than humans on average and therefore conclude that, perhaps surprisingly, no sophisticated background knowledge or high-level reasoning is required for navigating the complex Wikipedia network.

artificial intelligence, machine learning, reinforcement learning, (21 more...)

AAAI Conferences

Sixth International AAAI Conference on Weblogs and Social Media

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Austria (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)

Add feedback

PAC-Bayesian Policy Evaluation for Reinforcement Learning

Fard, Mahdi MIlani, Pineau, Joelle, Szepesvari, Csaba

arXiv.org Machine LearningFeb-14-2012

Bayesian priors offer a compact yet general means of incorporating domain knowledge into many learning tasks. The correctness of the Bayesian analysis and inference, however, largely depends on accuracy and correctness of these priors. PAC-Bayesian methods overcome this problem by providing bounds that hold regardless of the correctness of the prior distribution. This paper introduces the first PAC-Bayesian bound for the batch reinforcement learning problem with function approximation. We show how this bound can be used to perform model-selection in a transfer learning scenario. Our empirical results confirm that PAC-Bayesian policy evaluation is able to leverage prior distributions when they are informative and, unlike standard Bayesian RL approaches, ignore them when they are misleading.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1202.3717

Country:

North America > United States > Massachusetts (0.28)
North America > Canada > Alberta (0.28)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine (0.47)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback