AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

A Comparison of learning algorithms on the Arcade Learning Environment

Defazio, Aaron, Graepel, Thore

arXiv.org Artificial IntelligenceOct-30-2014

Reinforcement learning agents have traditionally been evaluated on small toy problems. With advances in computing power and the advent of the Arcade Learning Environment, it is now possible to evaluate algorithms on diverse and difficult problems within a consistent framework. We discuss some challenges posed by the arcade learning environment which do not manifest in simpler environments. We then provide a comparison of model-free, linear learning algorithms on this challenging problem set.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1410.862

Genre:

Research Report (0.50)
Instructional Material (0.34)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (1.00)
Leisure & Entertainment > Sports (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits

Agarwal, Alekh, Hsu, Daniel, Kale, Satyen, Langford, John, Li, Lihong, Schapire, Robert E.

arXiv.org Machine LearningOct-13-2014

In the contextual bandit problem, an agent collects rewards for actions taken over a sequence of rounds; in each round, the agent chooses an action to take on the basis of (i) context (or features) for the current round, as well as (ii) feedback, in the form of rewards, obtained in previous rounds. The feedback is incomplete: in any given round, the agent observes the reward only for the chosen action; the agent does not observe the reward for other actions. Contextual bandit problems are found in many important applications such as online recommendation and clinical trials, and represent a natural halfway point between supervised learning and reinforcement learning. The use of features to encode context is inherited from supervised machine learning, while exploration is necessary for good performance as in reinforcement learning. The choice of exploration distribution on actions is important. The strongest known results(Auer et al., 2002; McMahan and Streeter, 2009; Beygelzimer et al., 2011) provide algorithms that carefully control the exploration distribution to achieve an optimal regret after T rounds of () O KT log( Π /δ), with probability at least 1 δ, relative to a set of policies Π A

data mining, machine learning, reinforcement learning, (20 more...)

arXiv.org Machine Learning

1402.0555

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.54)

Add feedback

Q-learning for Optimal Control of Continuous-time Systems

Luo, Biao, Liu, Derong, Huang, Tingwen

arXiv.org Machine LearningOct-11-2014

In this paper, two Q-learning (QL) methods are proposed and their convergence theories are established for addressing the model-free optimal control problem of general nonlinear continuous-time systems. By introducing the Q-function for continuous-time systems, policy iteration based QL (PIQL) and value iteration based QL (VIQL) algorithms are proposed for learning the optimal control policy from real system data rather than using mathematical system model. It is proved that both PIQL and VIQL methods generate a nonincreasing Q-function sequence, which converges to the optimal Q-function. For implementation of the QL algorithms, the method of weighted residuals is applied to derived the parameters update rule. The developed PIQL and VIQL algorithms are essentially off-policy reinforcement learning approachs, where the system data can be collected arbitrary and thus the exploration ability is increased. With the data collected from the real system, the QL methods learn the optimal control policy offline, and then the convergent control policy will be employed to real system. The effectiveness of the developed QL algorithms are verified through computer simulation.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1410.2954

Country:

North America > United States (1.00)
Asia (0.68)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Reinforcement Learning Competition 2014

Dimitrakakis, Christos (Chalmers University of Technology) | Li, Guangliang (University of Amsterdam) | Tziortziotis, Nikoalos (University of Ioannina)

AI MagazineSep-29-2014

Reinforcement learning is one of the most general problems in artificial intelligence. It has been used to model problems in automated experiment design, control, economics, game playing, scheduling and telecommunications. The aim of the reinforcement learning competition is to encourage the development of very general learning agents for arbitrary reinforcement learning problems and to provide a test-bed for the unbiased evaluation of algorithms.

machine learning, management and information, reinforcement, (3 more...)

AI Magazine

Genre: Contests & Prizes (0.75)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Reinforcement Learning Competition 2014

Dimitrakakis, Christos (Chalmers University of Technology) | Li, Guangliang (University of Amsterdam) | Tziortziotis, Nikoalos (University of Ioannina)

AI MagazineSep-29-2014

agent, artificial intelligence, competition, (12 more...)

AI Magazine

Country:

North America > Canada > Alberta (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.05)
Europe > Greece > Epirus > Ioannina (0.05)
(6 more...)

Industry: Education (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Optimizing Player Experience in Interactive Narrative Planning: A Modular Reinforcement Learning Approach

Rowe, Jonathan (North Carolina State University) | Mott, Bradford (North Carolina State University) | Lester, James (North Carolina State University)

AAAI ConferencesSep-29-2014

Recent years have witnessed growing interest in data-driven approaches to interactive narrative planning and drama management. Reinforcement learning techniques show particular promise because they can automatically induce and refine models for tailoring game events by optimizing reward functions that explicitly encode interactive narrative experiences’ quality. Due to the inherently subjective nature of interactive narrative experience, designing effective reward functions is challenging. In this paper, we investigate the impacts of alternate formulations of reward in a reinforcement learning-based interactive narrative planner for the Crystal Island game environment. We formalize interactive narrative planning as a modular reinforcement-learning (MRL) problem. By decomposing interactive narrative planning into multiple independent sub-problems, MRL enables efficient induction of interactive narrative policies directly from a corpus of human players’ experience data. Empirical analyses suggest that interactive narrative policies induced with MRL are likely to yield better player outcomes than heuristic or baseline policies. Furthermore, we observe that MRL-based interactive narrative planners are robust to alternate reward discount parameterizations.

artificial intelligence, machine learning, modular reinforcement learning approach, (2 more...)

AAAI Conferences

Tenth Artificial Intelligence and Interactive Digital Entertainment Conference

Industry: Leisure & Entertainment > Games (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Policy Iteration Based on Stochastic Factorization

Barreto, A. M. S., Pineau, J., Precup, D.

Journal of Artificial Intelligence ResearchAug-20-2014

When a transition probability matrix is represented as the product of two stochastic matrices, one can swap the factors of the multiplication to obtain another transition matrix that retains some fundamental characteristics of the original. Since the derived matrix can be much smaller than its precursor, this property can be exploited to create a compact version of a Markov decision process (MDP), and hence to reduce the computational cost of dynamic programming. Building on this idea, this paper presents an approximate policy iteration algorithm called policy iteration based on stochastic factorization, or PISF for short. In terms of computational complexity, PISF replaces standard policy iteration's cubic dependence on the size of the MDP with a function that grows only linearly with the number of states in the model. The proposed algorithm also enjoys nice theoretical properties: it always terminates after a finite number of iterations and returns a decision policy whose performance only depends on the quality of the stochastic factorization. In particular, if the approximation error in the factorization is sufficiently small, PISF computes the optimal value function of the MDP. The paper also discusses practical ways of factoring an MDP and illustrates the usefulness of the proposed algorithm with an application involving a large-scale decision problem of real economical interest.

algorithm, factorization, mdp, (17 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4301

AI Access Foundation

10897

Journal of Artificial Intelligence Research

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre:

Research Report (0.46)
Overview (0.46)

Industry: Energy > Power Industry (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Probabilistic inverse reinforcement learning in unknown environments

Tossou, Aristide, Dimitrakakis, Christos

arXiv.org Machine LearningAug-9-2014

We consider the problem of learning by demonstration from agents acting in unknown stochastic Markov environments or games. Our aim is to estimate agent preferences in order to construct improved policies for the same task that the agents are trying to solve. To do so, we extend previous probabilistic approaches for inverse reinforcement learning in known MDPs to the case of unknown dynamics or opponents. We do this by deriving two simplified probabilistic models of the demonstrator's policy and utility. For tractability, we use maximum a posteriori estimation rather than full Bayesian inference. Under a flat prior, this results in a convex optimisation problem. We find that the resulting algorithms are highly competitive against a variety of other methods for inverse reinforcement learning that do have knowledge of the dynamics.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1408.2067

Country:

North America > United States (0.46)
Europe (0.46)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)

Add feedback

Active Sensing as Bayes-Optimal Sequential Decision Making

Ahmad, Sheeraz, Yu, Angela

arXiv.org Artificial IntelligenceAug-9-2014

Sensory inference under conditions of uncertainty is a major problem in both machine learning and computational neuroscience. An important but poorly understood aspect of sensory processing is the role of active sensing. Here, we present a Bayes-optimal inference and control framework for active sensing, C-DAC (Context-Dependent Active Controller). Unlike previously proposed algorithms that optimize abstract statistical objectives such as information maximization (Infomax) [Butko & Movellan, 2010] or one-step look-ahead accuracy [Najemnik & Geisler, 2005], our active sensing model directly minimizes a combination of behavioral costs, such as temporal delay, response error, and effort. We simulate these algorithms on a simple visual search task to illustrate scenarios in which context-sensitivity is particularly beneficial and optimization with respect to generic statistical objectives particularly inadequate. Motivated by the geometric properties of the C-DAC policy, we present both parametric and non-parametric approximations, which retain context-sensitivity while significantly reducing computational complexity. These approximations enable us to investigate the more complex problem involving peripheral vision, and we notice that the difference between C-DAC and statistical policies becomes even more evident in this scenario.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1408.2056

Country: North America > United States > California (0.28)

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Add feedback

Surprise and Curiosity for Big Data Robotics

White, Adam (University of Alberta) | Modayil, Joseph (University of Alberta) | Sutton, Richard S. (University of Alberta)

AAAI ConferencesJul-22-2014

This paper introduces a new perspective on curiosity and intrinsic motivation, viewed as the problem of generating behavior data for parallel off-policy learning.We provide 1) the first measure of surprise based on off-policy general value function learning progress, 2) the first investigation of reactive behavior control with parallel gradient temporal difference learning and function approximation, and 3) the first demonstration of using curiosity driven control to react to a non-stationary learning task---all on a mobile robot. Our approach improves scalability over previous off-policy, robot learning systems, essential for making progress on the ultimate big-data decision making problem---life-long robot learning.

data mining, machine learning, reinforcement learning, (3 more...)

AAAI Conferences

Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback