AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Scaling Up Robust MDPs by Reinforcement Learning

Tamar, Aviv, Xu, Huan, Mannor, Shie

arXiv.org Machine LearningJun-26-2013

We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm. Previous studies showed that robust MDPs, based on a minimax approach to handle uncertainty, can be solved using dynamic programming for small to medium sized problems. However, due to the "curse of dimensionality", MDPs that model real-life problems are typically prohibitively large for such approaches. In this work we employ a reinforcement learning approach to tackle this planning problem: we develop a robust approximate dynamic programming method based on a projected fixed point equation to approximately solve large scale robust MDPs. We show that the proposed method provably succeeds under certain technical conditions, and demonstrate its effectiveness through simulation of an option pricing problem. To the best of our knowledge, this is the first attempt to scale up the robust MDPs paradigm.

approximation, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1306.6189

Country:

Asia > Middle East > Israel (0.14)
Asia > Singapore (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Direct Uncertainty Estimation in Reinforcement Learning

Rodionov, Sergey, Potapov, Alexey, Vinogradov, Yurii

arXiv.org Artificial IntelligenceJun-25-2013

Absence of prior knowledge about the environment (absence of its precise model) is naturally characterized by the intuitive notion of uncertainty. However, no generally accepted accurate formal description of this notion exists. Probability theory is the most traditional way of describing uncertainty, but adequate interpretation of probability itself is not that clear. This can be seen from numerous paradoxes in probability theory, such as the grue emerald paradox. That is why some attempts of extending probability theory were made. The most well-known one is fuzzy set theory. However, fuzzy operations can be considered as probabilistic operations with some additional assumptions about operands (e.g.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1306.1553

Country: Europe > Russia (0.29)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

The Arcade Learning Environment: An Evaluation Platform for General Agents

Bellemare, Marc G., Naddaf, Yavar, Veness, Joel, Bowling, Michael

arXiv.org Artificial IntelligenceJun-21-2013

In this article we introduce the Arcade Learning Environment (ALE): both a challenge problem and a platform and methodology for evaluating the development of general, domain-independent AI technology. ALE provides an interface to hundreds of Atari 2600 game environments, each one different, interesting, and designed to be a challenge for human players. ALE presents significant research challenges for reinforcement learning, model learning, model-based planning, imitation learning, transfer learning, and intrinsic motivation. Most importantly, it provides a rigorous testbed for evaluating and comparing approaches to these problems. We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning. In doing so, we also propose an evaluation methodology made possible by ALE, reporting empirical results on over 55 different games. All of the software, including the benchmark agents, is publicly available.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.3912

1207.4708

Country: North America > Canada > Alberta (0.29)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)
Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

The Arcade Learning Environment: An Evaluation Platform for General Agents

Bellemare, M. G., Naddaf, Y., Veness, J., Bowling, M.

Journal of Artificial Intelligence ResearchJun-14-2013

agent, arcade learning environment, atari 2600, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3912

AI Access Foundation

10819

Journal of Artificial Intelligence Research

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
North America > United States > Michigan (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report (0.68)
Overview (0.46)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)
Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)

Add feedback

Temporal-Difference Search in Computer Go

Silver, David (University College London) | Sutton, Richard (University of Alberta) | Mueller, Martin (University of Alberta)

AAAI ConferencesJun-8-2013

Temporal-difference (TD) learning is one of the most successful and broadly applied solutions to the reinforcement learning problem; it has been used to achieve master-level play in chess, checkers and backgammon. Monte-Carlo tree search is a recent algorithm for simulation-based search, which has been used to achieve master-level play in Go. We have introduced a new approach to high-performance planning. Our method, TD search, combines TD learning with simulation-based search. Like Monte-Carlo tree search, value estimates are updated by learning online from simulated experience. Like TD learning, it uses value function approximation and bootstrapping to efficiently generalise between related states. We applied TD search to the game of 9x9 Go, using a million binary features matching simple patterns of stones. Without any explicit search tree, our approach outperformed a vanilla Monte-Carlo tree search with the same number of simulations. When combined with a simple alpha-beta search, our program also outperformed all traditional (pre-Monte-Carlo) search and machine learning programs on the 9x9 Computer Go Server.

machine learning, reinforcement learning, temporal-difference search, (1 more...)

AAAI Conferences

Twenty-Third International Conference on Automated Planning and Scheduling

Industry: Leisure & Entertainment > Games > Go (0.60)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games > Go (1.00)

Add feedback

Analysis of Watson's Strategies for Playing Jeopardy!

Tesauro, G., Gondek, D. C., Lenchner, J., Fan, J., Prager, J. M.

Journal of Artificial Intelligence ResearchMay-31-2013

Major advances in Question Answering technology were needed for IBM Watson to play Jeopardy! at championship level -- the show requires rapid-fire answers to challenging natural language questions, broad general knowledge, high precision, and accurate confidence estimates. In addition, Jeopardy! features four types of decision making carrying great strategic importance: (1) Daily Double wagering; (2) Final Jeopardy wagering; (3) selecting the next square when in control of the board; (4) deciding whether to attempt to answer, i.e., "buzz in." Using sophisticated strategies for these decisions, that properly account for the game state and future event probabilities, can significantly boost a player's overall chances to win, when compared with simple "rule of thumb" strategies. This article presents our approach to developing Watson's game-playing strategies, comprising development of a faithful simulation model, and then using learning and Monte-Carlo methods within the simulator to optimize Watson's strategic decision-making. After giving a detailed description of each of our game-strategy algorithms, we then focus in particular on validating the accuracy of the simulator's predictions, and documenting performance improvements using our methods. Quantitative performance benefits are shown with respect to both simple heuristic strategies, and actual human contestant performance in historical episodes. We further extend our analysis of human play to derive a number of valuable and counterintuitive examples illustrating how human contestants may improve their performance on the show.

contestant, jeopardy, watson, (17 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3834

AI Access Foundation

10818

Journal of Artificial Intelligence Research

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
Europe > Austria > Vienna (0.04)

Genre:

Research Report (0.46)
Contests & Prizes (0.34)

Industry:

Leisure & Entertainment > Sports (1.00)
Leisure & Entertainment > Games > Jeopardy! (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
(2 more...)

Add feedback

Reinforcement Learning for the Soccer Dribbling Task

Carvalho, Arthur, Oliveira, Renato

arXiv.org Machine LearningMay-28-2013

We propose a reinforcement learning solution to the \emph{soccer dribbling task}, a scenario in which a soccer agent has to go from the beginning to the end of a region keeping possession of the ball, as an adversary attempts to gain possession. While the adversary uses a stationary policy, the dribbler learns the best action to take at each decision point. After defining meaningful variables to represent the state space, and high-level macro-actions to incorporate domain knowledge, we describe our application of the reinforcement learning algorithm \emph{Sarsa} with CMAC for function approximation. Our experiments show that, after the training period, the dribbler is able to accomplish its task against a strong adversary around 58% of the time.

dribbler, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/CIG.2011.6031994

1305.6568

Country: Europe (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.35)

Add feedback

Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems

Ibrahimi, Morteza, Javanmard, Adel, Van Roy, Benjamin

arXiv.org Machine LearningMar-24-2013

We study the problem of adaptive control of a high dimensional linear quadratic (LQ) system. Previous work established the asymptotic convergence to an optimal controller for various adaptive control schemes. More recently, for the average cost LQ problem, a regret bound of ${O}(\sqrt{T})$ was shown, apart form logarithmic factors. However, this bound scales exponentially with $p$, the dimension of the state space. In this work we consider the case where the matrices describing the dynamic of the LQ system are sparse and their dimensions are large. We present an adaptive control scheme that achieves a regret bound of ${O}(p \sqrt{T})$, apart from logarithmic factors. In particular, our algorithm has an average cost of $(1+\eps)$ times the optimum cost after $T = \polylog(p) O(1/\eps^2)$. This is in comparison to previous work on the dense dynamics where the algorithm requires time that scales exponentially with dimension in order to achieve regret of $\eps$ times the optimal cost. We believe that our result has prominent applications in the emerging area of computational advertising, in particular targeted online advertising and advertising in social networks.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1303.5984

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.70)

Industry:

Information Technology > Services (0.54)
Marketing (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)

Add feedback

Automatic Abstraction in Reinforcement Learning Using Ant System Algorithm

Ghafoorian, Mohsen (Sharif University of Technology) | Taghizadeh, Nasrin (Sharif University of Technology) | Beigy, Hamid (Sharif University of Technology)

AAAI ConferencesMar-21-2013

Nowadays developing autonomous systems, which can act in various environments and interactively perform their assigned tasks, are intensively desirable. These systems would be ready to be applied in different fields such as medicine, controller robots and social life. Reinforcement learning is an attractive area of machine learning which addresses these concerns. In large scales, learning performance of an agent can be improved by using hierarchical Reinforcement Learning techniques and temporary extended actions. The higher level of abstraction helps the learning agent approach lifelong learning goals. In this paper a new method is presented for discovering subgoal states and constructing useful skills. The method utilizes Ant System optimization algorithm to identify bottleneck edges, which act like bridges between different connected areas of the problem space. Using discovered subgoals, the agent creates temporal abstractions, which enable it to explore more effectively. Experimental Results show that the proposed method can significantly improve the learning performance of the agent.

ant system algorithm, automatic abstraction, reinforcement learning, (1 more...)

AAAI Conferences

2013 AAAI Spring Symposium Series

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

Learning Sensorimotor Concepts Without Reinforcement

Mohammad, Yasser F. O. (Kyoto University) | Nishida, Toyoaki (Kyoto University)

AAAI ConferencesMar-21-2013

Agents engaged in lifelong learning can benefit from the ability to acquire new concepts from continuous interaction with objects in their environments which is a ubiquitous ability in humans. This paper advocates the use of sensorimotor concepts that combine perceptual and actuation patterns.Related representations to sensorimotor concepts are Predictive State Representation in dynamical systems, Affordance Based Concepts in language and Skills in reinforcement learning. The paper proposes a system for learning generalized sensorimotor concepts from unsegmented interactions between the agent and the objects in its environment that works in continuous action and observation spaces and in the same time require no reinforcement signals. A proof-of-concept experiment with the proposed system on a simulated e-puck robot is reported to support the applicability of the proposed approach.

learning sensorimotor concept, reinforcement

AAAI Conferences

2013 AAAI Spring Symposium Series

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback