AITopics

Player co-modelling in a strategy board game: discovering how to play fast

Kalles, Dimitris

In this paper we experiment with a 2-player strategy board game where playing models are evolved using reinforcement learning and neural networks. The models are evolved to speed up automatic game development based on human involvement at varying levels of sophistication and density when compared to fully autonomous playing. The experimental results suggest a clear and measurable association between the ability to win games and the ability to do that fast, while at the same time demonstrating that there is a minimum level of human involvement beyond which no learning really occurs.

machine learning, player co-modelling, reinforcement learning, (17 more...)

cs/0611164

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

On Measuring the Impact of Human Actions in the Machine Learning of a Board Game's Playing Policies

Kalles, Dimitris

We investigate systematically the impact of human intervention in the training of computer players in a strategy board game. In that game, computer players utilise reinforcement learning with neural networks for evolving th eir playing strategies and demonstrate a slow learning speed. Human intervention can significan tly enhance learning performance, but carrying it out systematically seems to be more of a problem of an integrated game development environment as opposed to automatic evolutionary learning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

cs/0611163

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Games (1.00)

Loth, Manuel, Preux, Philippe

A Unified View of TD Algorithms; Introducing Full-Gradient TD and Equi-Gradient Descent TD

This paper addresses the issue of policy evaluation in Markov Decision Processes, using linear function approximation. It provides a unified view of algorithms such as TD(lambda), LSTD(lambda), iLSTD, residual-gradient TD. It is asserted that they all consist in minimizing a gradient function and differ by the form of this function and their means of minimizing it. Two new schemes are introduced in that framework: Full-gradient TD which uses a generalization of the principle introduced in iLSTD, and EGD TD, which reduces the gradient by successive equi-gradient descents. These three algorithms form a new intermediate family with the interesting property of making much better use of the samples than TD while keeping a gradient descent scheme, which is useful for complexity issues and optimistic policy iteration.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

cs/0611145

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > France (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Ryabko, Daniil, Hutter, Marcus

Asymptotic Learnability of Reinforcement Problems with Arbitrary Dependence

We address the problem of reinforcement learning in which observations may exhibit an arbitrary form of stochastic dependence on past observations and actions. The task for an agent is to attain the best possible asymptotic reward where the true generating environment is unknown but belongs to a known countable family of environments. We find some sufficient conditions on the class of environments under which an agent exists which attains the best asymptotic reward for any environment in the class. We analyze how tight these conditions are and how they relate to different probabilistic assumptions known in reinforcement learning and related fields, such as Markov Decision Processes and mixing conditions.

machine learning, reinforcement learning, value-stable environment, (17 more...)

cs/0603110

Country: Europe (0.68)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

Zhumatiy, Viktor, Gomez, Faustino, Hutter, Marcus, Schmidhuber, Juergen

Metric State Space Reinforcement Learning for a Vision-Capable Mobile Robot

We address the problem of autonomously learning controllers for vision-capable mobile robots. We extend McCallum's (1995) Nearest-Sequence Memory algorithm to allow for general metrics over state-action trajectories. We demonstrate the feasibility of our approach by successfully running our algorithm on a real mobile robot. The algorithm is novel and unique in that it (a) explores the environment and learns directly on a mobile robot without using a hand-made computer model as an intermediate step, (b) does not require manual discretization of the sensor input space, (c) works in piecewise continuous perceptual spaces, and (d) copes with partial observability. Together this allows learning from much less experience compared to previous methods.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

cs/0603023

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Szita, Istvan, Lorincz, Andras

Reinforcement Learning with Linear Function Approximation and LQ control Converges

arXiv.org Artificial IntelligenceNov-30-2009

Reinforcement learning is commonly used with function approximation. However, very few positive results are known about the convergence of function approximation based RL control algorithms. In this paper we show that TD(0) and Sarsa(0) with linear function approximation is convergent for a simple class of problems, where the system is linear and the costs are quadratic (the LQ control problem). Furthermore, we show that for systems with Gaussian noise and non-completely observable states (the LQG problem), the mentioned RL algorithms are still convergent, if they are combined with Kalman filtering.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

cs/0306120

Country: North America > United States > California (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceSep-16-2009

A Convergent Online Single Time Scale Actor Critic Algorithm

Di Castro, D., Meir, R.

Actor-Critic based approaches were among the first to address reinforcement learning in a general setting. Recently, these algorithms have gained renewed interest due to their generality, good convergence properties, and possible biological relevance. In this paper, we introduce an online temporal difference based actor-critic algorithm which is proved to converge to a neighborhood of a local maximum of the average reward. Linear function approximation is used by the critic in order estimate the value function, and the temporal difference signal, which is passed from the critic to the actor. The main distinguishing feature of the present convergence proof is that both the actor and the critic operate on a similar time scale, while in most current convergence proofs they are required to have very different time scales in order to converge. Moreover, the same temporal difference signal is used to update the parameters of both the actor and the critic. A limitation of the proposed approach, compared to results available for two time scale convergence, is that convergence is guaranteed only to a neighborhood of an optimal value, rather to an optimal value itself. The single time scale and identical temporal difference signal used by the actor and the critic, may provide a step towards constructing more biologically realistic models of reinforcement learning in the brain.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

0909.2934

Genre: Research Report (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Grünewälder, Steffen, Obermayer, Klaus

The Optimal Unbiased Value Estimator and its Relation to LSTD, TD and MC

arXiv.org Machine LearningAug-24-2009

In this analytical study we derive the optimal unbiased value estimator (MVU) and compare its statistical risk to three well known value estimators: Temporal Difference learning (TD), Monte Carlo estimation (MC) and Least-Squares Temporal Difference Learning (LSTD). We demonstrate that LSTD is equivalent to the MVU if the Markov Reward Process (MRP) is acyclic and show that both differ for most cyclic MRPs as LSTD is then typically biased. More generally, we show that estimators that fulfill the Bellman equation can only be unbiased for special cyclic MRPs. The main reason being the probability measures with which the expectations are taken. These measure vary from state to state and due to the strong coupling by the Bellman equation it is typically not possible for a set of value estimators to be unbiased with respect to each of these measures. Furthermore, we derive relations of the MVU to MC and TD. The most important one being the equivalence of MC to the MVU and to LSTD for undiscounted MRPs in which MC has the same amount of information. In the discounted case this equivalence does not hold anymore. For TD we show that it is essentially unbiased for acyclic MRPs and biased for cyclic MRPs. We also order estimators according to their risk and present counter-examples to show that no general ordering exists between the MVU and LSTD, between MC and LSTD and between TD and MC. Theoretical results are supported by examples and an empirical evaluation.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

0908.3458

Country: Europe (0.28)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Thomas, Philip Sebastian (Case Western Reserve University) | Bogert, Antonie van den (Lerner Research Institute) | Jagodnik, Kathleen (Case Western Reserve University) | Branicky, Michael (Case Western Reserve University)

Application of the Actor-Critic Architecture to Functional Electrical Stimulation Control of a Human Arm

AAAI ConferencesJul-14-2009

Clinical tests have shown that the dynamics of a human arm, controlled using Functional Electrical Stimulation (FES), can vary significantly between and during trials. In this paper, we study the application of the actor-critic architecture, with neural networks for the both the actor and the critic, as a controller that can adapt to these changing dynamics of a human arm. Development and tests were done in simulation using a planar arm model and Hill-based muscle dynamics. We begin by training it using a Proportional Derivative (PD) controller as a supervisor. We then make clinically relevant changes to the dynamics of the arm and test the actor-critic's ability to adapt without supervision in a reasonable number of episodes. Finally, we devise methods for achieving both rapid learning and long-term stability.

controller, machine learning, reinforcement learning, (19 more...)

AAAI Conferences

Twenty-First IAAI Conference

Country: North America > United States > Ohio > Cuyahoga County > Cleveland (0.04)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.95)
Health & Medicine > Health Care Technology (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)