AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Active Imitation Learning via Reduction to I.I.D. Active Learning

Judah, Kshitij, Fern, Alan, Dietterich, Thomas G.

arXiv.org Machine LearningOct-16-2012

In standard passive imitation learning, the goal is to learn a target policy by passively observing full execution trajectories of it. Unfortunately, generating such trajectories can require substantial expert effort and be impractical in some cases. In this paper, we consider active imitation learning with the goal of reducing this effort by querying the expert about the desired action at individual states, which are selected based on answers to past queries and the learner's interactions with an environment simulator. We introduce a new approach based on reducing active imitation learning to i.i.d. active learning, which can leverage progress in the i.i.d. setting. Our first contribution, is to analyze reductions for both non-stationary and stationary policies, showing that the label complexity (number of queries) of active imitation learning can be substantially less than passive learning. Our second contribution, is to introduce a practical algorithm inspired by the reductions, which is shown to be highly effective in four test domains compared to a number of alternatives.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1210.4876

Country: North America > United States > Oregon (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

CLASSQ-L: A Q-Learning Algorithm for Adversarial Real-Time Strategy Games

Jaidee, Ulit (Lehigh University) | Munoz-Avila, Hector (Lehigh University)

AAAI ConferencesOct-7-2012

We present CLASS Q-L (for: class Q-learning) an application of the Q-learning reinforcement learning algorithm to play complete Wargus games. Wargus is a real-time strategy game where players control armies consisting of units of different classes (e.g., archers, knights). CLASS Q-L uses a single table for each class of unit so that each unit is controlled and updates its class’ Q-table. This enables rapid learning as in Wargus there are many units of the same class. We present initial results of CLASS Q-L against a variety of opponents.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

AAAI Conferences

Eighth Artificial Intelligence and Interactive Digital Entertainment Conference

Country:

North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Optimistic Agents are Asymptotically Optimal

Sunehag, Peter, Hutter, Marcus

arXiv.org Artificial IntelligenceSep-29-2012

We use optimism to introduce generic asymptotically optimal reinforcement learning agents. They achieve, with an arbitrary finite or compact class of environments, asymptotically optimal behavior. Furthermore, in the finite deterministic case we provide finite error bounds.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

1210.0077

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)

Add feedback

Hybrid systems modeling for gas transmission network

Noori, Amir, Menhaj, Mohammad Bagher, Shafiee, Masoud

arXiv.org Artificial IntelligenceSep-28-2012

Gas Transmission Networks are large-scale complex systems, and corresponding design and control problems are challenging. In this paper, we consider the problem of control and management of these systems in crisis situations. We present these networks by a hybrid systems framework that provides required analysis models. Further, we discuss decision-making using computational discrete and hybrid optimization methods. In particular, several reinforcement learning methods are employed to explore decision space and achieve the best policy in a specific crisis situation. Simulations are presented to illustrate the efficiency of the method.

arXiv.org Artificial Intelligence

1208.1743

Genre: Research Report (0.40)

Industry: Energy > Power Industry (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

Bellman Error Based Feature Generation using Random Projections on Sparse Spaces

Fard, Mahdi Milani, Grinberg, Yuri, Farahmand, Amir-massoud, Pineau, Joelle, Precup, Doina

arXiv.org Machine LearningSep-21-2012

We address the problem of automatic generation of features for value function approximation. Bellman Error Basis Functions (BEBFs) have been shown to improve the error of policy evaluation with function approximation, with a convergence rate similar to that of value iteration. We propose a simple, fast and robust algorithm based on random projections to generate BEBFs for sparse feature spaces. We provide a finite sample analysis of the proposed method, and prove that projections logarithmic in the dimension of the original space are enough to guarantee contraction in the error. Empirical results demonstrate the strength of this method.

data mining, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1207.5554

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
(2 more...)

Add feedback

Learning to Interpret Natural Language Instructions

MacGlashan, James (University of Maryland, Baltimore County) | Babes-Vroman, Monica (Rutgers University) | Winner, Kevin (University of Maryland, Baltimore County) | Gao, Ruoyuan (Rutgers University) | Adjogah, Richard (University of Maryland, Baltimore County) | desJardins, Marie (University of Maryland, Baltimore County) | Littman, Michael (Rutgers University) | Muresan, Smaranda (Rutgers University)

AAAI ConferencesJul-21-2012

We address the problem of training an artificial agent to follow verbal commands using a set of instructions paired with demonstration traces of appropriate behavior. From this data, a mapping from instructions to tasks is learned, enabling the agent to carry out new instructions in novel environments. Our system consists of three components: semantic parsing (SP), inverse reinforcement learning (IRL), and task abstraction (TA). SP parses sentences into logical form representations, but when learning begins, the domain/task specific meanings of these representations are unknown. IRL takes demonstration traces and determines the likely reward functions that gave rise to these traces, defined over a set of provided features. TA combines results from SP and IRL over a set of training instances to create abstract goal definitions of tasks. TA also provides SP domain specific meanings for its logical forms and provides IRL the set of task-relevant features.

agent, instruction, propositional function, (17 more...)

AAAI Conferences

Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Maryland > Baltimore County (0.04)
North America > United States > Maryland > Baltimore (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Add feedback

Sequential Decision Making with Rank Dependent Utility: A Minimax Regret Approach

Jeantet, Gildas (AiRPX) | Perny, Patrice (University Pierre and Marie Curie and CNRS) | Spanjaard, Olivier (University Pierre and Marie Curie and CNRS)

AAAI ConferencesJul-21-2012

This paper is devoted to sequential decision making with Rank Dependent expected Utility (RDU). This decision criterion generalizes Expected Utility and enables to model a wider range of observed (rational) behaviors. In such a sequential decision setting, two conflicting objectives can be identified in the assessment of a strategy: maximizing the performance viewed from the initial state (optimality), and minimizing the incentive to deviate during implementation (deviation-proofness). In this paper, we propose a minimax regret approach taking these two aspects into account, and we provide a search procedure to determine an optimal strategy for this model. Numerical results are presented to show the interest of the proposed approach in terms of optimality, deviation-proofness and computability.

decision tree, node, rdu, (15 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Texas (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
(2 more...)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration

Silva, Bruno C. da (University of Massachusetts Amherst) | Barto, Andrew G. (University of Massachusetts Amherst)

AAAI ConferencesJul-21-2012

We study the problem of finding efficient exploration policies for the case in which an agent is momentarily not concerned with exploiting, and instead tries to compute a policy for later use. We first formally define the Optimal Exploration Problem as one of sequential sampling and show that its solutions correspond to paths of minimum expected length in the space of policies. We derive a model-free, local linear approximation to such solutions and use it to construct efficient exploration policies. We compare our model-free approach to other exploration techniques, including one with the best known PAC bounds, and show that ours is both based on a well-defined optimization problem and empirically efficient.

approximation, exploration, q-function, (16 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Improving Hybrid Vehicle Fuel Efficiency Using Inverse Reinforcement Learning

Vogel, Adam (Stanford University) | Ramachandran, Deepak (Honda Research Institute (USA) Inc.) | Gupta, Rakesh (Honda Research Institute (USA) Inc.) | Raux, Antoine (Honda Research Institute (USA) Inc.)

AAAI ConferencesJul-21-2012

Deciding what mix of engine and battery power to use is critical to hybrid vehicles' fuel efficiency. Current solutions consider several factors such as the charge of the battery and how efficient the engine operates at a given speed. Previous research has shown that by taking into account the future power requirements of the vehicle, a more efficient balance of engine vs. battery power can be attained. In this paper, we utilize a probabilistic driving route prediction system, trained using Inverse Reinforcement Learning, to optimize the hybrid control policy. Our approach considers routes that the driver is likely to be taking, computing an optimal mix of engine and battery power. This approach has the potential to increase vehicle power efficiency while not requiring any hardware modification or change in driver behavior. Our method outperforms a standard hybrid control policy, yielding an average of 1.22% fuel savings.

battery, driver model, vehicle, (14 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Energy > Energy Storage (0.74)
Transportation > Infrastructure & Services (0.71)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Scalable Inverse Reinforcement Learning via Instructed Feature Construction

Singliar, Tomas (Boeing Research and Technology) | Margineantu, Dragos D. (Boeing Research and Technology)

AAAI ConferencesJul-21-2012

Inverse reinforcement learning (IRL) techniques (Ng and Russell, 2000) provide a foundation for detecting abnormal agent behavior and predicting agent intent through estimating its reward function. Unfortunately, IRL algorithms suffer from the large dimensionality of the reward function space. Meanwhile, most applications that can benefit from an IRL-based approach to assessing agent intent, involve interaction with an analyst or domain expert. This paper proposes a procedure for scaling up IRL by eliciting good IRL basis functions from the domain expert. Further, we propose a new paradigm for modeling limited rationality. Unlike traditional models of limited rationality that assume an agent making stochastic choices with the value function being treated as if it is known, we propose that observed irrational behavior is actually due to uncertainty about the cost of future actions. This treatment normally leads to a POMDP formulation which is unnecessarily complicated, and we show that adding a simple noise term to the value function approximation accomplishes the same at a much smaller cost.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

AAAI Conferences

Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence

Country: North America > United States > Washington > King County > Seattle (0.05)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback