AITopics

For many problems which would be natural for reinforcement learning, the reward signal is not a single scalar value but has multiple scalar components. Examplesof such problems include agents with multiple goals and agents with multiple users. Creating a single reward value by combining themultiple components can throwaway vital information and can lead to incorrect solutions. We describe the multiple reward source problem and discuss the problems with applying traditional reinforcement learning.We then present an new algorithm for finding a solution and results on simulated environments.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Country: North America > United States > Massachusetts (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Sallans, Brian, Hinton, Geoffrey E.

Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task

The problem of reinforcement learning in large factored Markov decision processes is explored. The Q-value of a state-action pair is approximated by the free energy of a product of experts network. Network parameters are learned online using a modified SARSA algorithm which minimizes the inconsistency of the Q-values of consecutive state-action pairs. Actions arechosen based on the current value estimates by fixing the current state and sampling actions from the network using Gibbs sampling. The algorithm is tested on a cooperative multi-agent task. The product of experts model is found to perform comparably to table-based Q-Iearning for small instances of the task, and continues to perform well when the problem becomes too large for a table-based representation.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Country:

North America > Canada > Ontario > Toronto (0.15)
North America > United States > Massachusetts > Middlesex County (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Ormoneit, Dirk, Glynn, Peter W.

Kernel-Based Reinforcement Learning in Average-Cost Problems: An Application to Optimal Portfolio Choice

Many approaches to reinforcement learning combine neural networks orother parametric function approximators with a form of temporal-difference learning to estimate the value function of a Markov Decision Process. A significant disadvantage of those procedures isthat the resulting learning algorithms are frequently unstable. In this work, we present a new, kernel-based approach to reinforcement learning which overcomes this difficulty and provably converges to a unique solution. By contrast to existing algorithms, our method can also be shown to be consistent in the sense that its costs converge to the optimal costs asymptotically. Our focus is on learning in an average-cost framework and on a practical application tothe optimal portfolio choice problem. 1 Introduction Temporal-difference (TD) learning has been applied successfully to many real-world applications that can be formulated as discrete state Markov Decision Processes (MDPs) with unknown transition probabilities.

machine learning, reinforcement, reinforcement learning, (15 more...)

Country: North America > United States > California > Santa Clara County (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.55)

Morimoto, Jun, Doya, Kenji

Robust Reinforcement Learning

KenjiDoya ATR International; CREST, JST 2-2 Hikaridai Seika-cho Soraku-gun Kyoto 619-0288 JAPAN doya@isd.atr.co.jp Abstract This paper proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors.The use of environmental models in RL is quite popular for both off-line learning by simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, often unwanted results. Based on the theory of H oocontrol, we consider a differential game in which a'disturbing' agent (disturber) tries to make the worst possible disturbance while a'control' agent (actor) tries to make the best control input. The problem is formulated as finding a minmax solutionof a value function that takes into account the norm of the output deviation and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference tothe value function.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Country:

Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.25)
North America > United States > California > San Francisco County > San Francisco (0.14)

Industry: Education > Educational Setting (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Decomposition of Reinforcement Learning for Admission Control of Self-Similar Call Arrival Processes

Carlström, Jakob

In multi-service communications networks, such as Asynchronous Transfer Mode (ATM) networks, resource control is of crucial importance for the network operator as well as for the users. The objective is to maintain the service quality while maximizing the operator's revenue. At the call level, service quality (Grade of Service) is measured in terms of call blocking probabilities, and the key resource to be controlled is bandwidth. Network routing and call admission control (CAC) are two such resource control problems. Markov decision processes offer a framework for optimal CAC and routing [1].

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country: Europe > Sweden (0.15)

Industry: Telecommunications (0.54)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Low Power Wireless Communication via Reinforcement Learning

Brown, Timothy X.

This paper examines the application of reinforcement learning to a wireless communicationproblem. The problem requires that channel utility be maximized while simultaneously minimizing battery usage. We present a solution to this multi-criteria problem that is able to significantly reducepower consumption. The solution uses a variable discount factor to capture the effects of battery usage. 1 Introduction Reinforcement learning (RL) has been applied to resource allocation problems in telecommunications, e.g.,channel allocation in wireless systems, network routing, and admission control in telecommunication networks [1,2, 8, 10]. These have demonstrated reinforcement learningcan find good policies that significantly increase the application reward within the dynamics of the telecommunication problems.

application, low power wireless communication, packet, (12 more...)

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
Asia > Middle East > Jordan (0.05)

Genre: Research Report (0.34)

Industry: Telecommunications (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Sutton, Richard S., McAllester, David A., Singh, Satinder P., Mansour, Yishay

Policy Gradient Methods for Reinforcement Learning with Function Approximation

Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, independent of the value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actor-critic methods are examples of this approach. Our main new result is to show that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.

approximation, function approximation, gradient, (12 more...)

Country:

Asia > Middle East > Jordan (0.06)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Konda, Vijay R., Tsitsiklis, John N.

Actor-Critic Algorithms

We propose and analyze a class of actor-critic algorithms for simulation-based optimization of a Markov decision process over a parameterized family of randomized stationary policies. These are two-time-scale algorithms in which the critic uses TD learning with a linear approximation architecture and the actor is updated in an approximate gradient direction based on information provided by the critic. We show that the features for the critic should span a subspace prescribed by the choice of parameterization of the actor. We conclude by discussing convergence properties and some open problems.

algorithm, approximation, value function, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

State Abstraction in MAXQ Hierarchical Reinforcement Learning

Dietterich, Thomas G.

Many researchers have explored methods for hierarchical reinforcement learning (RL) with temporal abstractions, in which abstract actions are defined that can perform many primitive actions before terminating. However, little is known about learning with state abstractions, in which aspects of the state space are ignored. In previous work, we developed the MAXQ method for hierarchical RL. In this paper, we define five conditions under which state abstraction can be combined with the MAXQ value function decomposition. We prove that the MAXQ-Q learning algorithm converges under these conditions and show experimentally that state abstraction is important for the successful application of MAXQ-Q learning.

passenger, state abstraction, subtask, (12 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > Oregon > Benton County > Corvallis (0.04)
(4 more...)

Industry: Transportation (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Singh, Satinder P., Kearns, Michael J., Litman, Diane J., Walker, Marilyn A.

Reinforcement Learning for Spoken Dialogue Systems

Recently, a number of authors have proposed treating dialogue systems as Markov decision processes (MDPs). However, the practical application ofMDP algorithms to dialogue systems faces a number of severe technical challenges. We have built a general software tool (RLDS, for Reinforcement Learning for Dialogue Systems) based on the MDP framework, and have applied it to dialogue corpora gathered from two dialogue systems built at AT&T Labs. Our experiments demonstrate that RLDS holds promise as a tool for "browsing" and understanding correlations in complex, temporally dependent dialogue corpora.

dialogue, dialogue system, state estimator, (13 more...)

Country: North America > United States > New York (0.04)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)