AITopics

The Octopus arm is a highly versatile and complex limb. How the Octopus controlssuch a hyper-redundant arm (not to mention eight of them!) is as yet unknown. Robotic arms based on the same mechanical principles mayrender present day robotic arms obsolete. In this paper, we tackle this control problem using an online reinforcement learning algorithm, basedon a Bayesian approach to policy evaluation known as Gaussian process temporal difference (GPTD) learning. Our substitute for the real arm is a computer simulation of a 2-dimensional model of an Octopus arm. Even with the simplifications inherent to this model, the state space we face is a high-dimensional one. We apply a GPTDbased algorithmto this domain, and demonstrate its operation on several learning tasks of varying degrees of difficulty.

algorithm, experiment, octopus arm, (15 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Industry: Health & Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

TD(0) Leads to Better Policies than Approximate Value Iteration

Roy, Benjamin V.

We consider approximate value iteration with a parameterized approximator inwhich the state space is partitioned and the optimal cost-to-go function over each partition is approximated by a constant. We establish performanceloss bounds for policies derived from approximations associated with fixed points. These bounds identify benefits to having projection weights equal to the invariant distribution of the resulting policy. Suchprojection weighting leads to the same fixed points as TD(0). Our analysis also leads to the first performance loss bound for approximate valueiteration with an average cost objective.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Country:

North America > United States > Massachusetts > Middlesex County (0.14)
North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

Rafols, Eddie, Koop, Anna, Sutton, Richard S.

Temporal Abstraction in Temporal-difference Networks

We present a generalization of temporal-difference networks to include temporally abstract options on the links of the question network.

machine learning, prediction, reinforcement learning, (19 more...)

Country:

North America > United States (0.28)
North America > Canada > Alberta (0.28)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Precup, Doina, Paduraru, Cosmin, Koop, Anna, Sutton, Richard S., Singh, Satinder P.

Off-policy Learning with Options and Recognizers

We introduce a new algorithm for off-policy temporal-difference learning withfunction approximation that has lower variance and requires less knowledge of the behavior policy than prior methods. We develop the notion ofa recognizer, a filter on actions that distorts the behavior policy to produce a related target policy with low-variance importance-sampling corrections. We also consider target policies that are deviations from the state distribution of the behavior policy, such as potential temporally abstract options, which further reduces variance. This paper introduces recognizers and their potential advantages, then develops a full algorithm for linear function approximation and proves that its updates are in the same direction as on-policy TD updates, which implies asymptotic convergence. Eventhough our algorithm is based on importance sampling, we prove that it requires absolutely no knowledge of the behavior policy for the case of state-aggregation function approximators.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Country:

North America > Canada > Alberta (0.29)
North America > United States > Michigan (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Niv, Yael, Daw, Nathaniel D., Dayan, Peter

How fast to work: Response vigor, motivation and tonic dopamine

Reinforcement learning models have long promised to unify computational, psychologicaland neural accounts of appetitively conditioned behavior. However,the bulk of data on animal conditioning comes from free-operant experiments measuring how fast animals will work for reinforcement. Existingreinforcement learning (RL) models are silent about these tasks, because they lack any notion of vigor. They thus fail to address thesimple observation that hungrier animals will work harder for food, as well as stranger facts such as their sometimes greater productivity evenwhen working for irrelevant outcomes such as water. Here, we develop an RL framework for free-operant behavior, suggesting that subjects choose how vigorously to perform selected actions by optimally balancing the costs and benefits of quick responding.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Country: North America > United States (1.00)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.49)

Blatt, Doron, Hero, Alfred O.

From Weighted Classification to Policy Search

This paper proposes an algorithm to convert a T -stage stochastic decision problem with a continuous state space to a sequence of supervised learning problems.The optimization problem associated with the trajectory tree and random trajectory methods of Kearns, Mansour, and Ng, 2000, is solved using the Gauss-Seidel method. The algorithm breaks a multistage reinforcementlearning problem into a sequence of single-stage reinforcement learningsubproblems, each of which is solved via an exact reduction to a weighted-classification problem that can be solved using off-the-self methods. Thus the algorithm converts a reinforcement learning probleminto simpler supervised learning subproblems. It is shown that the method converges in a finite number of steps to a solution that cannot be further improved by componentwise optimization. The implication ofthe proposed algorithm is that a plethora of classification methods can be applied to find policies in the reinforcement learning problem.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Industry: Education > Focused Education > Special Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)

Bagnell, Drew, Ng, Andrew Y.

On Local Rewards and Scaling Distributed Reinforcement Learning

We consider the scaling of the number of examples necessary to achieve good performance in distributed, cooperative, multi-agent reinforcement learning, as a function of the the number of agents n. We prove a worstcase lowerbound showing that algorithms that rely solely on a global reward signal to learn policies confront a fundamental limit: They require anumber of real-world examples that scales roughly linearly in the number of agents. For settings of interest with a very large number of agents, this is impractical. We demonstrate, however, that there is a class of algorithms that, by taking advantage of local reward signals in large distributed Markov Decision Processes, are able to ensure good performance witha number of samples that scales as O(log n). This makes them applicable even in settings with a very large number of agents n.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Ponsen, Marc, Munoz-Avila, Hector, Spronck, Pieter, Aha, David W.

Automatically Generating Game Tactics through Evolutionary Learning

AI MagazineSep-15-2006

Dynamic scripting is a reinforcement learning approach to adaptive game AI that learns, during gameplay, which game tactics an opponent should select to play effectively. We introduce the evolutionary state-based tactics generator (ESTG), which uses an evolutionary algorithm to generate tactics automatically. Experimental results show that ESTG improves dynamic scripting's performance in a real-time strategy game. We conclude that high-quality domain knowledge can be automatically generated for strong adaptive game AI opponents.

artificial intelligence, automatically, computer game, (8 more...)

AI Magazine

Industry: Leisure & Entertainment > Games > Computer Games (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Ponsen, Marc, Munoz-Avila, Hector, Spronck, Pieter, Aha, David W.

Automatically Generating Game Tactics through Evolutionary Learning

AI MagazineSep-15-2006

The decision-making process of computer-controlled opponents in video games is called game AI. Adaptive game AI can improve the entertainment value of games by allowing computer-controlled opponents to ix weaknesses automatically in the game AI and to respond to changes in human-player tactics. Dynamic scripting is a reinforcement learning approach to adaptive game AI that learns, during gameplay, which game tactics an opponent should select to play effectively. In previous work, the tactics used by dynamic scripting were designed manually. We introduce the evolutionary state-based tactics generator (ESTG), which uses an evolutionary algorithm to generate tactics automatically. Experimental results show that ESTG improves dynamic scripting's performance in a real-time strategy game. We conclude that high-quality domain knowledge can be automatically generated for strong adaptive game AI opponents. Game developers can bene it from applying ESTG, as it considerably reduces the time and effort needed to create adaptive game AI.

evolutionary algorithm, machine learning, reinforcement learning, (15 more...)

AI Magazine

Country:

Europe (1.00)
North America > United States > California (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)