AITopics

Country:

North America > United States > Massachusetts (0.14)
Europe > United Kingdom > England (0.14)

Industry: Government > Military > Air Force (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Schaerf, A., Shoham, Y., Tennenholtz, M.

Adaptive Load Balancing: A Study in Multi-Agent Learning

Journal of Artificial Intelligence ResearchMay-1-1995

We study the process of multi-agent reinforcement learning in the context ofload balancing in a distributed system, without use of either centralcoordination or explicit communication. We first define a precise frameworkin which to study adaptive load balancing, important features of which are itsstochastic nature and the purely local information available to individualagents. Given this framework, we show illuminating results on the interplaybetween basic adaptive behavior parameters and their effect on systemefficiency. We then investigate the properties of adaptive load balancing inheterogeneous populations, and address the issue of exploration vs.exploitation in that context. Finally, we show that naive use ofcommunication may not improve, and might even harm system efficiency.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

doi: 10.1613/jair.121

AI Access Foundation

10133

Country: North America > United States (0.14)

Industry: Energy > Power Industry (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Journal of Artificial Intelligence ResearchJan-1-1995

Truncating Temporal Differences: On the Efficient Implementation of TD(lambda) for Reinforcement Learning

Cichosz, P.

Temporal difference (TD) methods constitute a class of methods for learning predictions in multi-step prediction problems, parameterized by a recency factor lambda. Currently the most important application of these methods is to temporal credit assignment in reinforcement learning. Well known reinforcement learning algorithms, such as AHC or Q-learning, may be viewed as instances of TD learning. This paper examines the issues of the efficient and general implementation of TD(lambda) for arbitrary lambda, for use with reinforcement learning algorithms optimizing the discounted sum of rewards. The traditional approach, based on eligibility traces, is argued to suffer from both inefficiency and lack of generality. The TTD (Truncated Temporal Differences) procedure is proposed as an alternative, that indeed only approximates TD(lambda), but requires very little computation per action and can be used with arbitrary function representation methods. The idea from which it is derived is fairly simple and not new, but probably unexplored so far. Encouraging experimental results are presented, suggesting that using lambda > 0 with the TTD procedure allows one to obtain a significant learning speedup at essentially the same cost as usual TD(0) learning.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

doi: 10.1613/jair.135

AI Access Foundation

10128

Country: Europe > Poland (0.14)

Genre: Research Report (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Schraudolph, Nicol N., Dayan, Peter, Sejnowski, Terrence J.

Temporal Difference Learning of Position Evaluation in the Game of Go

Computational Neurobiology Laboratory The Salk Institute for Biological Studies San Diego, CA 92186-5800 Abstract The game of Go has a high branching factor that defeats the tree search approach used in computer chess, and long-range spatiotemporal interactionsthat make position evaluation extremely difficult. Development of conventional Go programs is hampered by their knowledge-intensive nature. We demonstrate a viable alternative by training networks to evaluate Go positions via temporal difference(TD) learning. Our approach is based on network architectures that reflect the spatial organization of both input and reinforcement signals on the Go board, and training protocols that provide exposure to competent (though unlabelled) play. These techniques yield far better performance than undifferentiated networks trained by selfplay alone.A network with less than 500 weights learned within 3,000 games of 9x9 Go a position evaluation function that enables a primitive one-ply search to defeat a commercial Go program at a low playing level. 1 INTRODUCTION Go was developed three to four millenia ago in China; it is the oldest and one of the most popular board games in the world.

artificial intelligence, chess, temporal difference learning, (15 more...)

Country:

North America > United States > California > San Diego County > San Diego (0.24)
North America > United States > Massachusetts (0.14)

Industry:

Leisure & Entertainment > Games > Go (0.85)
Leisure & Entertainment > Games > Chess (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Schraudolph, Nicol N., Dayan, Peter, Sejnowski, Terrence J.

Temporal Difference Learning of Position Evaluation in the Game of Go

Furthermore, we have verified that weights learned from 9x9 Go offer a suitable basis for further training on the full-size (19x19) board.

artificial intelligence, reinforcement learning, temporal difference learning, (14 more...)

Country: North America > United States > Massachusetts (0.14)

Industry: Leisure & Entertainment > Games > Go (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

The Parti-Game Algorithm for Variable Resolution Reinforcement Learning in Multidimensional State-Spaces

Moore, Andrew W.

Parti-game is a new algorithm for learning from delayed rewards in high dimensional real-valued state-spaces. In high dimensions it is essential that learning does not explore or plan over state space uniformly. Part i-game maintains a decision-tree partitioning of state-space and applies game-theory and computational geometry techniques to efficiently and reactively concentrate high resolution only on critical areas. Many simulated problems have been tested, ranging from 2-dimensional to 9-dimensional state-spaces, including mazes, path planning, nonlinear dynamics, and uncurling snake robots in restricted spaces. In all cases, a good solution is found in less than twenty trials and a few minutes. 1 REINFORCEMENT LEARNING Reinforcement learning [Samuel, 1959, Sutton, 1984, Watkins, 1989, Barto et al., 1991] is a promising method for control systems to program and improve themselves.

artificial intelligence, partition, reinforcement learning, (13 more...)

Country: North America > United States > Massachusetts (0.15)

Genre: Research Report (0.34)

Industry: Leisure & Entertainment > Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Montague, P. Read, Dayan, Peter, Sejnowski, Terrence J.

Foraging in an Uncertain Environment Using Predictive Hebbian Learning

Survival is enhanced by an ability to predict the availability of food, the likelihood of predators, and the presence of mates. We present a concrete model that uses diffuse neurotransmitter systems to implement a predictive version of a Hebb learning rule embedded in a neural architecture based on anatomical and physiological studies on bees. The model captured the strategies seen in the behavior of bees and a number of other animals when foraging in an uncertain environment. The predictive model suggests a unified way in which neuromodulatory influences can be used to bias actions and control synaptic plasticity. Successful predictions enhance adaptive behavior by allowing organisms to prepare for future actions, rewards, or punishments. Moreover, it is possible to improve upon behavioral choices if the consequences of executing different actions can be reliably predicted. Although classical and instrumental conditioning results from the psychological literature [1] demonstrate that the vertebrate brain is capable of reliable prediction, how these predictions are computed in brains is not yet known. The brains of vertebrates and invertebrates possess small nuclei which project axons throughout large expanses of target tissue and deliver various neurotransmitters such as dopamine, norepinephrine, and acetylcholine [4]. The activity in these systems may report on reinforcing stimuli in the world or may reflect an expectation of future reward [5, 6,7,8].

health & medicine, neurology, prediction, (20 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.15)
North America > United States > California > San Diego County (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Jaakkola, Tommi, Jordan, Michael I., Singh, Satinder P.

Convergence of Stochastic Iterative Dynamic Programming Algorithms

Increasing attention has recently been paid to algorithms based on dynamic programming (DP) due to the suitability of DP for learning problems involving control. In stochastic environments where the system being controlled is only incompletely known, however, a unifying theoretical account of these methods has been missing. In this paper we relate DPbased learning algorithms to the powerful techniques of stochastic approximation via a new convergence theorem, enabling us to establish a class of convergent algorithms to which both TD("\) and Q-Iearning belong. 1 INTRODUCTION Learning to predict the future and to find an optimal way of controlling it are the basic goals of learning systems that interact with their environment. A variety of algorithms are currently being studied for the purposes of prediction and control in incompletely specified, stochastic environments. Here we consider learning algorithms defined in Markov environments. There are actions or controls (u) available for the learner that affect both the state transition probabilities, and the probability distribution for the immediate, state dependent costs (Ci(u)) incurred by the learner.

algorithm, artificial intelligence, optimization problem, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Montague, P. Read, Dayan, Peter, Sejnowski, Terrence J.

Foraging in an Uncertain Environment Using Predictive Hebbian Learning

Survival is enhanced by an ability to predict the availability of food, the likelihood of predators, and the presence of mates. We present a concrete model that uses diffuse neurotransmitter systems to implement a predictive version of a Hebb learning rule embedded in a neural architecture based on anatomical and physiological studies on bees. The model captured the strategies seen in the behavior of bees and a number of other animals when foraging in an uncertain environment. The predictive model suggests a unified way in which neuromodulatory influences can be used to bias actions and control synaptic plasticity. Successful predictions enhance adaptive behavior by allowing organisms to prepare for future actions, rewards, or punishments. Moreover, it is possible to improve upon behavioral choices if the consequences of executing different actions can be reliably predicted. Although classical and instrumental conditioning results from the psychological literature [1] demonstrate that the vertebrate brain is capable of reliable prediction, how these predictions are computed in brains is not yet known. The brains of vertebrates and invertebrates possess small nuclei which project axons throughout large expanses of target tissue and deliver various neurotransmitters such as dopamine, norepinephrine, and acetylcholine [4]. The activity in these systems may report on reinforcing stimuli in the world or may reflect an expectation of future reward [5, 6,7,8].

health & medicine, neurology, prediction, (20 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.15)
North America > United States > California > San Diego County (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Boyan, Justin A., Littman, Michael L.

Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach

The field of reinforcement learning has grown dramatically over the past several years, but with the exception of backgammon [8, 2], has had few successful applications to large-scale, practical tasks. This paper demonstrates that the practical task of routing packets through a communication network is a natural application for reinforcement learning algorithms.

algorithm, artificial intelligence, télécommunications, (17 more...)