AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Environmental statistics and the trade-off between model-based and TD learning in humans

Simon, Dylan A., Daw, Nathaniel D.

Neural Information Processing SystemsDec-31-2011

There is much evidence that humans and other animals utilize a combination of model-based and model-free RL methods. Although it has been proposed that these systems may dominate according to their relative statistical efficiency in different circumstances, there is little specific evidence -- especially in humans -- as to the details of this trade-off. Accordingly, we examine the relative performance of different RL approaches under situations in which the statistics of reward are differentially noisy and volatile. Using theory and simulation, we show that model-free TD learning is relatively most disadvantaged in cases of high volatility and low noise. We present data from a decision-making experiment manipulating these parameters, showing that humans shift learning strategies in accord with these predictions. The statistical circumstances favoring model-based RL are also those that promote a high learning rate, which helps explain why, in psychology, the distinction between these strategies is traditionally conceived in terms of rule-based vs. incremental learning.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Nonlinear Inverse Reinforcement Learning with Gaussian Processes

Levine, Sergey, Popovic, Zoran, Koltun, Vladlen

Neural Information Processing SystemsDec-31-2011

We present a probabilistic algorithm for nonlinear inverse reinforcement learning. The goal of inverse reinforcement learning is to learn the reward function in a Markov decision process from expert demonstrations. While most prior inverse reinforcement learning algorithms represent the reward as a linear combination of a set of features, we use Gaussian processes to learn the reward as a nonlinear function, while also determining the relevance of each feature to the expert's policy. Our probabilistic algorithm allows complex behaviors to be captured from suboptimal stochastic demonstrations, while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Bayesian multitask inverse reinforcement learning

Dimitrakakis, Christos, Rothkopf, Constantin

arXiv.org Artificial IntelligenceNov-17-2011

We generalise the problem of inverse reinforcement learning to multiple tasks, from multiple demonstrations. Each one may represent one expert trying to solve a different task, or as different experts trying to solve the same task. Our main contribution is to formalise the problem as statistical preference elicitation, via a number of structured priors, whose form captures our biases about the relatedness of different tasks or expert policies. In doing so, we introduce a prior on policy optimality, which is more natural to specify. We show that our framework allows us not only to learn to efficiently from multiple experts but to also effectively differentiate between the goals of each. Possible applications include analysing the intrinsic motivations of subjects in behavioural experiments and learning from multiple teachers.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-642-29946-9_27

1106.3655

Country:

Europe > Switzerland (0.04)
Europe > Germany (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

Add feedback

Robust Bayesian reinforcement learning through tight lower bounds

Dimitrakakis, Christos

arXiv.org Machine LearningNov-11-2011

In the Bayesian approach to sequential decision making, exact calculation of the (subjective) utility is intractable. This extends to most special cases of interest, such as reinforcement learning problems. While utility bounds are known to exist for this problem, so far none of them were particularly tight. In this paper, we show how to efficiently calculate a lower bound, which corresponds to the utility of a near-optimal memoryless policy for the decision problem, which is generally different from both the Bayes-optimal policy and the policy which is optimal for the expected MDP under the current belief. We then show how these can be applied to obtain robust exploration policies in a Bayesian reinforcement learning setting.

machine learning, reinforcement, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1106.3651

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)

Add feedback

Simultaneous Abstract and Concrete Reinforcement Learning

Matos, Tiago (Universidade de Sao Paulo) | Bergamo, Yannick P. (Universidade de Sao Paulo) | Silva, Valdinei Freire da (Universidade de Sao Paulo) | Cozman, Fabio G. (Universidade de Sao Paulo) | Costa, Anna Helena Reali (Universidade de Sao Paulo)

AAAI ConferencesNov-1-2011

Suppose an agent builds a policy that satisfactorily solves a decision problem; suppose further that some aspects of this policy are abstracted and used as starting point in a new, different decision problem. How can the agent accrue the benefits of the abstract policy in the new concrete problem? In this paper we propose a framework for simultaneous reinforcement learning, where the abstract policy helps start up the policy for the concrete problem, and both policies are refined through exploration. We report experiments that demonstrate that our framework is effective in speeding up policy construction for practical problems.

abstract policy, iscorridor, isneardoor, (15 more...)

AAAI Conferences

Ninth Symposium of Abstraction, Reformulation, and Approximation

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Brazil > São Paulo (0.04)
North America > United States > New York (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Optimal Reinforcement Learning for Gaussian Systems

Hennig, Philipp

arXiv.org Machine LearningOct-14-2011

The exploration-exploitation trade-off is among the central challenges of reinforcement learning. The optimal Bayesian solution is intractable in general. This paper studies to what extent analytic statements about optimal learning are possible if all beliefs are Gaussian processes. A first order approximation of learning of both loss and dynamics, for nonlinear, time-varying systems in continuous time and space, subject to a relatively weak restriction on the dynamics, is described by an infinite-dimensional partial differential equation. An approximate finite-dimensional projection gives an impression for how this result may be helpful.

artificial intelligence, reinforcement learning, upstream oil & gas, (16 more...)

arXiv.org Machine Learning

1106.08

Country:

North America > United States > Massachusetts (0.14)
Europe > Germany (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris

Scherrer, Bruno

arXiv.org Artificial IntelligenceOct-11-2011

We consider the discrete-time infinite-horizon optimal control problem formalized by Markov Decision Processes (Puterman, 1994; Bertsekas and Tsitsiklis, 1996). We revisit the work of Bertsekas and Ioffe (1996), that introduced λ Policy Iteration, a family of algorithms parameterized by λ that generalizes the standard algorithms Value Iteration and Policy Iteration, and has some deep connections with the Temporal Differences algorithm TD(λ) described by Sutton and Barto (1998). We deepen the original theory developped by the authors by providing convergence rate bounds which generalize standard bounds for Value Iteration described for instance by Puterman (1994). Then, the main contribution of this paper is to develop the theory of this algorithm when it is used in an approximate form and show that this is sound. Doing so, we extend and unify the separate analyses developped by Munos for Approximate Value Iteration (Munos, 2007) and Approximate Policy Iteration (Munos, 2003). Eventually, we revisit the use of this algorithm in the training of a Tetris playing controller as originally done by Bertsekas and Ioffe (1996). We provide an original performance bound that can be applied to such an undiscounted control problem. Our empirical results are different from those of Bertsekas and Ioffe (which were originally qualified as "paradoxical" and "intriguing"), and much more conform to what one would expect from a learning experiment. We discuss the possible reason for such a difference.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

0711.0694

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Object-Oriented Approach to Reinforcement Learning in an Action Game

Mohan, Shiwali (University of Michigan, Ann Arbor) | Laird, John E. (University of Michigan )

AAAI ConferencesOct-9-2011

In this work, we look at the challenge of learning in an action game,Infinite Mario. Learning to play an action game can be divided intotwo distinct but related problems, learning an object-relatedbehavior and selecting a primitive action. We propose a framework that allows for the use of reinforcement learning for both ofthese problems. We present promising results in some instances of thegame and identify some problems that might affect learning.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

AAAI Conferences

Seventh Artificial Intelligence and Interactive Digital Entertainment Conference

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands (0.04)
Europe > Belgium (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

Add feedback

Learning Policies for First Person Shooter Games Using Inverse Reinforcement Learning

Tastan, Bulent (University of Central Florida) | Sukthankar, Gita Reese (University of Central Florida)

AAAI ConferencesOct-9-2011

The creation of effective autonomous agents (bots) for combat scenarios has long been a goal of the gaming industry. However, a secondary consideration is whether the autonomous bots behave like human players; this is especially important for simulation/training applications which aim to instruct participants in real-world tasks. Bots often compensate for a lack of combat acumen with advantages such as accurate targeting, predefined navigational networks, and perfect world knowledge, which makes them challenging but often predictable opponents. In this paper, we examine the problem of teaching a bot to play like a human in first-person shooter game combat scenarios. Our bot learns attack, exploration and targeting policies from data collected from expert human player demonstrations in Unreal Tournament. We hypothesize that one key difference between human players and autonomous bots lies in the relative valuation of game states. To capture the internal model used by expert human players to evaluate the benefits of different actions, we use inverse reinforcement learning to learn rewards for different game states. We report the results of a human subjects' study evaluating the performance of bot policies learned from human demonstration against a set of standard bot policies. Our study reveals that human players found our bots to be significantly more human-like than the standard bots during play. Our technique represents a promising stepping-stone toward addressing challenges such as the Bot Turing Test (the CIG Bot 2K Competition).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

AAAI Conferences

Seventh Artificial Intelligence and Interactive Digital Entertainment Conference

Country:

North America > United States > Florida > Orange County > Orlando (0.14)
Europe > Spain > Galicia > Madrid (0.04)
Asia (0.04)

Genre: Research Report > Experimental Study (0.70)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes

Fern, A., Givan, R., Yoon, S.

arXiv.org Artificial IntelligenceSep-9-2011

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions, which is often the case for the relational MDPs we are interested in. In order to apply API to such problems, we introduce a relational policy language and corresponding learner. In addition, we introduce a new bootstrapping routine for goal-based planning domains, based on random walks. Such bootstrapping is necessary for many large relational MDPs, where reward is extremely sparse, as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach, suggesting future work.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.1700

1109.2156

Genre: Research Report > New Finding (0.67)

Industry:

Transportation (0.47)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.93)
(2 more...)

Add feedback