AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Learning Interrogation Strategies while Considering Deceptions in Detective Interactive Stories

Chen, Guan-Yi (National Tsing Hua University) | Kao, Edward C.-C. (National Tsing Hua University) | Soo, Von-Wun (National Tsing Hua University)

AAAI ConferencesNov-10-2013

The strategies for interactive characters to select appropriate dialogues remain as an open issue in related research areas. In this paper we propose an approach based on reinforcement learning to learn the strategy of interrogation dialogue from one virtual agent toward another. The emotion variation of the suspect agent is modeled with a hazard function, and the detective agent must learn its interrogation strategies based on the emotion state of the suspect agent. The reinforcement learning reward schemes are evaluated to choose the proper reward in the dialogue. Our contribution is twofold. Firstly, we proposed a new framework of reinforcement learning to model dialogue strategies. Secondly, background knowledge and emotion states of agents are brought into the dialogue strategies. The resulted dialogue strategy in our experiment is sensitive in detecting lies from the suspect, and with it the interrogator may receive more correct answer.

learning interrogation strategy, machine learning, reinforcement learning, (3 more...)

AAAI Conferences

Ninth Artificial Intelligence and Interactive Digital Entertainment Conference

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.93)

Add feedback

Data-based approximate policy iteration for nonlinear continuous-time optimal control design

Luo, Biao, Wu, Huai-Ning, Huang, Tingwen, Liu, Derong

arXiv.org Machine LearningNov-2-2013

This paper addresses the model-free nonlinear optimal problem with generalized cost functional, and a data-based reinforcement learning technique is developed. It is known that the nonlinear optimal control problem relies on the solution of the Hamilton-Jacobi-Bellman (HJB) equation, which is a nonlinear partial differential equation that is generally impossible to be solved analytically. Even worse, most of practical systems are too complicated to establish their accurate mathematical model. To overcome these difficulties, we propose a data-based approximate policy iteration (API) method by using real system data rather than system model. Firstly, a model-free policy iteration algorithm is derived for constrained optimal control problem and its convergence is proved, which can learn the solution of HJB equation and optimal control policy without requiring any knowledge of system mathematical model. The implementation of the algorithm is based on the thought of actor-critic structure, where actor and critic neural networks (NNs) are employed to approximate the control policy and cost function, respectively. To update the weights of actor and critic NNs, a least-square approach is developed based on the method of weighted residuals. The whole data-based API method includes two parts, where the first part is implemented online to collect real system information, and the second part is conducting offline policy iteration to learn the solution of HJB equation and the control policy. Then, the data-based API algorithm is simplified for solving unconstrained optimal control problem of nonlinear and linear systems. Finally, we test the efficiency of the data-based API control design method on a simple nonlinear system, and further apply it to a rotational/translational actuator system. The simulation results demonstrate the effectiveness of the proposed method.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

1311.0396

Country:

North America > United States > New Jersey (0.28)
North America > United States > New York (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Control Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning for Matrix Computations: PageRank as an Example

Borkar, Vivek S., Mathkar, Adwaitvedant S.

arXiv.org Machine LearningNov-1-2013

Reinforcement learning has gained wide popularity as a technique for simulation-driven approximate dynamic programming. A less known aspect is that the very reasons that make it effective in dynamic programming can also be leveraged for using it for distributed schemes for certain matrix computations involving non-negative matrices. In this spirit, we propose a reinforcement learning algorithm for PageRank computation that is fashioned after analogous schemes for approximate dynamic programming. The algorithm has the advantage of ease of distributed implementation and more importantly, of being model-free, i.e., not dependent on any specific assumptions about the transition probabilities in the random web-surfer model. We analyze its convergence and finite time behavior and present some supporting numerical experiments.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1311.2889

Country:

North America > United States > Massachusetts > Middlesex County (0.15)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Survey of Multi-Objective Sequential Decision-Making

Roijers, D. M., Vamplew, P., Whiteson, S., Dazeley, R.

Journal of Artificial Intelligence ResearchOct-18-2013

Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work.

momdp, objective, scalarization function, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3987

AI Access Foundation

10836

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(10 more...)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Energy > Power Industry (1.00)
Banking & Finance (1.00)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Variance Adjusted Actor Critic Algorithms

Tamar, Aviv, Mannor, Shie

arXiv.org Machine LearningOct-14-2013

In Reinforcement Learning (RL; [1]) and planning in Markov Decision Processes (MDPs; [2]), the typical objective is to maximize the cumulative (possibly discounted) expected reward, denoted by J. When the model's parameters are known, several well-established and efficient optimization algorithms are known. When the model parameters are not known, learning is needed and there are several algorithmic frameworks that solve the learning problem effectively, at least when the model is finite. Among these, actor-critic methods [3] are known to be particularly efficient. In typical actor-critic algorithms, the critic maintains an estimate of the value function - the expected reward-to-go. This function is then used by the actor to estimate the gradient of the objective with respect to some policy parameters, and then improve the policy by modifying the parameters in the direction of the gradient. The theory that underlies actor-critic algorithms is the policy gradient theorem [4], which relates the value function with the policy gradient.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

1310.3697

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Batch-iFDD for Representation Expansion in Large MDPs

Geramifard, Alborz, Walsh, Thomas J., Roy, Nicholas, How, Jonathan

arXiv.org Machine LearningSep-26-2013

Matching pursuit (MP) methods are a promising class of feature construction algorithms for value function approximation. Yet existing MP methods require creating a pool of potential features, mandating expert knowledge or enumeration of a large feature pool, both of which hinder scalability. This paper introduces batch incremental feature dependency discovery (Batch-iFDD) as an MP method that inherits a provable convergence property. Additionally, Batch-iFDD does not require a large pool of features, leading to lower computational complexity. Empirical policy evaluation results across three domains with up to one million states highlight the scalability of Batch-iFDD over the previous state of the art MP algorithm.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1309.6831

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Approximate Kalman Filter Q-Learning for Continuous State-Space MDPs

Tripp, Charles, Shachter, Ross D.

arXiv.org Machine LearningSep-26-2013

We seek to learn an effective policy for a Markov Decision Process (MDP) with continuous states via Q-Learning. Given a set of basis functions over state action pairs we search for a corresponding set of linear weights that minimizes the mean Bellman residual. Our algorithm uses a Kalman filter model to estimate those weights and we have developed a simpler approximate Kalman filter model that outperforms the current state of the art projected TD-Learning methods on several standard benchmark problems.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1309.6868

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Sample Complexity of Multi-task Reinforcement Learning

Brunskill, Emma, Li, Lihong

arXiv.org Machine LearningSep-26-2013

Transferring knowledge across a sequence of reinforcement-learning tasks is challenging, and has a number of important applications. Though there is encouraging empirical evidence that transfer can improve performance in subsequent reinforcement-learning tasks, there has been very little theoretical analysis. In this paper, we introduce a new multi-task algorithm for a sequence of reinforcement-learning tasks when each task is sampled independently from (an unknown) distribution over a finite set of Markov decision processes whose parameters are initially unknown. For this setting, we prove under certain assumptions that the per-task sample complexity of exploration is reduced significantly due to transfer compared to standard single-task algorithms. Our multi-task algorithm also has the desired characteristic that it is guaranteed not to exhibit negative transfer: in the worst case its per-task sample complexity is comparable to the corresponding single-task algorithm.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

1309.6821

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Artificial Intelligence Based Cognitive Routing for Cognitive Radio Networks

Qadir, Junaid

arXiv.org Artificial IntelligenceAug-31-2013

Cognitive radio networks (CRNs) are networks of nodes equipped with cognitive radios that can optimize performance by adapting to network conditions. While cognitive radio networks (CRN) are envisioned as intelligent networks, relatively little research has focused on the network level functionality of CRNs. Although various routing protocols, incorporating varying degrees of adaptiveness, have been proposed for CRNs, it is imperative for the long term success of CRNs that the design of cognitive routing protocols be pursued by the research community. Cognitive routing protocols are envisioned as routing protocols that fully and seamless incorporate AI-based techniques into their design. In this paper, we provide a self-contained tutorial on various AI and machine-learning techniques that have been, or can be, used for developing cognitive routing protocols. We also survey the application of various classes of AI techniques to CRNs in general, and to the problem of routing in particular. We discuss various decision making techniques and learning techniques from AI and document their current and potential applications to the problem of routing in CRNs. We also highlight the various inference, reasoning, modeling, and learning sub tasks that a cognitive routing protocol must solve. Finally, open research issues and future directions of work are identified.

evolutionary algorithm, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10462-015-9438-6

1309.0085

Country:

Asia (0.46)
North America > United States (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)
Instructional Material > Course Syllabus & Notes (0.48)

Industry: Telecommunications > Networks (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(8 more...)

Add feedback

Map Matching with Inverse Reinforcement Learning

Osogami, Takayuki (IBM) | Raymond, Rudy (IBM)

AAAI ConferencesAug-3-2013

We study map-matching, the problem of estimating the route that is traveled by a vehicle, where the points observed with the Global Positioning System are available. A state-of-the-art approach for this problem is a Hidden Markov Model (HMM). We propose a particular transition probability between latent road segments by the use of the number of turns in addition to the travel distance between the latent road segments. We use inverse reinforcement learning to estimate the importance of the number of turns relative to the travel distance. This estimated importance is incorporated in the transition probability of the HMM. We show, through numerical experiments, that the error of map-matching can be reduced substantially with the proposed transition probability.

inverse reinforcement learning, map matching

AAAI Conferences

Twenty-Third International Joint Conference on Artificial Intelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)

Add feedback