Goto

Collaborating Authors

 Reinforcement Learning


Measuring Intelligence through Games

arXiv.org Artificial Intelligence

Artificial general intelligence (AGI) refers to research aimed at tackling the full problem of artificial intelligence, that is, create truly intelligent agents. This sets it apart from most AI research which aims at solving relatively narrow domains, such as character recognition, motion planning, or increasing player satisfaction in games. But how do we know when an agent is truly intelligent? A common point of reference in the AGI community is Legg and Hutter's formal definition of universal intelligence, which has the appeal of simplicity and generality but is unfortunately incomputable. Games of various kinds are commonly used as benchmarks for "narrow" AI research, as they are considered to have many important properties. We argue that many of these properties carry over to the testing of general intelligence as well. We then sketch how such testing could practically be carried out. The central part of this sketch is an extension of universal intelligence to deal with finite time, and the use of sampling of the space of games expressed in a suitably biased game description language.


Dynamic Policy Programming

arXiv.org Artificial Intelligence

In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. We prove the finite-iteration and asymptotic l\infty-norm performance-loss bounds for DPP in the presence of approximation/estimation error. The bounds are expressed in terms of the l\infty-norm of the average accumulated error as opposed to the l\infty-norm of the error in the case of the standard approximate value iteration (AVI) and the approximate policy iteration (API). This suggests that DPP can achieve a better performance than AVI and API since it averages out the simulation noise caused by Monte-Carlo sampling throughout the learning process. We examine this theoretical results numerically by com- paring the performance of the approximate variants of DPP with existing reinforcement learning (RL) methods on different problem domains. Our results show that, in all cases, DPP-based algorithms outperform other RL methods by a wide margin.



Self-configuration from a Machine-Learning Perspective

arXiv.org Machine Learning

The goal of machine learning is to provide solutions which are trained by data or by experience coming from the environment. Many training algorithms exist and some brilliant successes were achieved. But even in structured environments for machine learning (e.g. data mining or board games), most applications beyond the level of toy problems need careful hand-tuning or human ingenuity (i.e. detection of interesting patterns) or both. We discuss several aspects how self-configuration can help to alleviate these problems. One aspect is the self-configuration by tuning of algorithms, where recent advances have been made in the area of SPO (Sequen- tial Parameter Optimization). Another aspect is the self-configuration by pattern detection or feature construction. Forming multiple features (e.g. random boolean functions) and using algorithms (e.g. random forests) which easily digest many fea- tures can largely increase learning speed. However, a full-fledged theory of feature construction is not yet available and forms a current barrier in machine learning. We discuss several ideas for systematic inclusion of feature construction. This may lead to partly self-configuring machine learning solutions which show robustness, flexibility, and fast learning in potentially changing environments.


Feature Reinforcement Learning In Practice

arXiv.org Artificial Intelligence

Following a recent surge in using history-based methods for resolving perceptual aliasing in reinforcement learning, we introduce an algorithm based on the feature reinforcement learning framework called PhiMDP. To create a practical algorithm we devise a stochastic search procedure for a class of context trees based on parallel tempering and a specialized proposal distribution. We provide the first empirical evaluation for PhiMDP. Our proposed algorithm achieves superior performance to the classical U-tree algorithm and the recent active-LZ algorithm, and is competitive with MC-AIXI-CTW that maintains a bayesian mixture over all context trees up to a chosen depth.We are encouraged by our ability to compete with this sophisticated method using an algorithm that simply picks one single model, and uses Q-learning on the corresponding MDP. Our PhiMDP algorithm is much simpler, yet consumes less time and memory. These results show promise for our future work on attacking more complex and larger problems.


Between Frustration and Elation: Sense of Control Regulates the lntrinsic Motivation for Motor Learning

AAAI Conferences

Frustration has been generally viewed in a negative light and its potential role in learning neglected. We propose a new approach to intrinsically motivated learning where frustration is a key factor that allows to dynamically balance exploration and exploitation. Moreover, based on the result obtained from our experiment with older infants, we propose that a temporary decrease in learning from negative feedback can also be beneficial in fine-tuning a newly learned behavior. We suggest that this temporal indifference to the outcome of an action may be related to the sense of control, and results from the state of elation, that is the experience of overcoming a very difficult task after prolonged frustration. Our preliminary simulation results serve as a proof-of-concept for our approach.


Deep Belief Nets as Function Approximators for Reinforcement Learning

AAAI Conferences

We describe a continuous state/action reinforcement learning method which uses deep belief networks (DBNs) in conjunction with a value function-based reinforcement learning algorithm to learn effective control policies. Our approach is to first learn a model of the state-action space from data in an unsupervised pre-training phase, and then use neural-fitted Q-iteration (NFQ) to learn an accurate value function approximator (analogous to a "fine-tuning" phase when training DBNs for classification). Our experiments suggest that this approach has the potential to significantly increase the efficiency of the learning process in NFQ, provided care is taken to ensure the initial data covers interesting areas of the state-action space, and may be particularly useful in transfer learning settings.


Markov Games of Incomplete Information for Multi-Agent Reinforcement Learning

AAAI Conferences

Partially observable stochastic games (POSGs) are an attractive model for many multi-agent domains, but are computationally extremely difficult to solve. We present a new model, Markov games of incomplete information (MGII) which imposes a mild restriction on POSGs while overcoming their primary computational bottleneck. Finally we show how to convert a MGII into a continuous but bounded fully observable stochastic game. MGIIs represents the most general tractable model for multi-agent reinforcement learning to date.


Action-Based Autonomous Grounding

AAAI Conferences

When a new-born animal (agent) opens its eyes, what it sees is a patchwork of light and dark patterns, the natural scene.What is perceived by the agent at this moment is based on the patternof neural spikes in its brain. Life-long learning begins with such a flood of spikes in the brain. All knowledge and skills learned by the agent are mediated by such spikes, thus it is critical to understand what information these spikes convey and how they can be used to generate meaningful behavior. Here, we consider how agents can autonomously understand the meaning of these spikes without direct reference to the stimulus. We find that this problem, the problem of grounding, is unsolvable if the agent is passively perceiving, and that it can be solved only through self-initiated action. Furthermore, we show that a simple criterion, combined with standard reinforcement learning, can help solve this problem. We will present simulation results and discuss the implications of these results on life-long learning.


Lightweight Adaptation in Model-Based Reinforcement Learning

AAAI Conferences

Reinforcement learning algorithms can train an agent to operate successfully in a stationary environment. Most real-world environments, however, are subject to change over time. Research in the areas of transfer learning and lifelong learning addresses this problem by developing new algorithms that allow agents to adapt to environment change. Current trends in this area include model-free learning and data-driven adaptation methods. This paper explores in the opposite direction of those trends. Arguing that model-based algorithms may be better suited to the problem, it looks at adaptation in the context of model-based learning. Noting that standard algorithms themselves have some built-in capability for adaptation, it analyzes when and why a standard algorithm struggles to adapt to environment change. Then it experiments with lightweight and straightforward methods for adapting effectively.