Goto

Collaborating Authors

 Reinforcement Learning


Report on the 2008 Reinforcement Learning Competition

AI Magazine

This article reports on the 2008 Reinforcement Learning Competition,  which began in November 2007 and ended with a workshop at the  International Conference on Machine Learning (ICML) in July 2008 in  Helsinki, Finland.  Researchers from around the world developed  reinforcement learning agents to compete in six problems of various  complexity and difficulty.  The competition employed fundamentally  redesigned evaluation frameworks that, unlike those in previous  competitions, aimed to systematically encourage the submission of  robust learning methods. We describe the unique challenges of  empirical evaluation in reinforcement learning and briefly review  the history of the previous competitions and the evaluation  frameworks they employed.  We also describe the novel frameworks  developed for the 2008 competition as well as the software  infrastructure on which they rely.  Furthermore, we describe the six  competition domains and present a summary of selected competition  results.  Finally, we discuss the implications of these results and  outline ideas for the future of the competition.


Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes

arXiv.org Artificial Intelligence

Approximate dynamic programming has been used successfully in a large variety of domains, but it relies on a small set of provided approximation features to calculate solutions reliably. Large and rich sets of features can cause existing algorithms to overfit because of a limited number of samples. We address this shortcoming using $L_1$ regularization in approximate linear programming. Because the proposed method can automatically select the appropriate richness of features, its performance does not degrade with an increasing number of features. These results rely on new and stronger sampling bounds for regularized approximate linear programs. We also propose a computationally efficient homotopy method. The empirical evaluation of the approach shows that the proposed method performs well on simple MDPs and standard benchmark problems.


Adaptive Bases for Reinforcement Learning

arXiv.org Artificial Intelligence

We consider the problem of reinforcement learning using function approximation, where the approximating basis can change dynamically while interacting with the environment. A motivation for such an approach is maximizing the value function fitness to the problem faced. Three errors are considered: approximation square error, Bellman residual, and projected Bellman residual. Algorithms under the actor-critic framework are presented, and shown to converge. The advantage of such an adaptive basis is demonstrated in simulations.


A Minimum Relative Entropy Principle for Learning and Acting

arXiv.org Artificial Intelligence

This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is an agent that has been designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the well-known Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert.


Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning

Neural Information Processing Systems

Suppose we train an animal in a conditioning experiment. Can one predict how a given animal, under given experimental conditions, would perform the task? Since various factors such as stress, motivation, genetic background, and previous errors in task performance can influence animal behaviour, this appears to be a very challenging aim. Reinforcement learning (RL) models have been successful in modeling animal (and human) behaviour, but their success has been limited because of uncertainty as to how to set meta-parameters (such as learning rate, exploitation-exploration balance and future reward discount factor) that strongly influence model performance. We show that a simple RL model whose metaparameters are controlled by an artificial neural network, fed with inputs such as stress, affective phenotype, previous task performance, and even neuromodulatory manipulations, can successfully predict mouse behaviour in the "hole-box" - a simple conditioning task. Our results also provide important insights on how stress and anxiety affect animal learning, performance accuracy, and discounting of future rewards, and on how noradrenergic systems can interact with these processes.


On the asymptotic equivalence between differential Hebbian and temporal difference learning using a local third factor

Neural Information Processing Systems

In this theoretical contribution we provide mathematical proof that two of the most important classes of network learning - correlation-based differential Hebbian learning and reward-based temporal difference learning - are asymptotically equivalent when timing the learning with a local modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning framework from a correlation based perspective that is more closely related to the biophysics of neurons.


Psychiatry: Insights into depression through normative decision-making models

Neural Information Processing Systems

Decision making lies at the very heart of many psychiatric diseases. It is also a central theoretical concern in a wide variety of fields and has undergone detailed, in-depth, analyses. We take as an example Major Depressive Disorder (MDD), applying insights from a Bayesian reinforcement learning framework. We focus on anhedonia and helplessness. Helplessness--a core element in the conceptualizations of MDD that has lead to major advances in its treatment, pharmacological and neurobiological understanding--is formalized as a simple prior over the outcome entropy of actions in uncertain environments.


Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning

Neural Information Processing Systems

Suppose we train an animal in a conditioning experiment. Can one predict how a given animal, under given experimental conditions, would perform the task? Since various factors such as stress, motivation, genetic background, and previous errors in task performance can influence animal behaviour, this appears to be a very challenging aim. Reinforcement learning (RL) models have been successful in modeling animal (and human) behaviour, but their success has been limited because of uncertainty as to how to set meta-parameters (such as learning rate, exploitation-exploration balance and future reward discount factor) that strongly influence model performance. We show that a simple RL model whose metaparameters are controlled by an artificial neural network, fed with inputs such as stress, affective phenotype, previous task performance, and even neuromodulatory manipulations, can successfully predict mouse behaviour in the "hole-box" - a simple conditioning task. Our results also provide important insights on how stress and anxiety affect animal learning, performance accuracy, and discounting of future rewards, and on how noradrenergic systems can interact with these processes.


On the asymptotic equivalence between differential Hebbian and temporal difference learning using a local third factor

Neural Information Processing Systems

In this theoretical contribution we provide mathematical proof that two of the most important classes of network learning - correlation-based differential Hebbian learning and reward-based temporal difference learning - are asymptotically equivalent when timing the learning with a local modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning framework from a correlation based perspective that is more closely related to the biophysics of neurons.


Psychiatry: Insights into depression through normative decision-making models

Neural Information Processing Systems

Decision making lies at the very heart of many psychiatric diseases. It is also a central theoretical concern in a wide variety of fields and has undergone detailed, in-depth, analyses. We take as an example Major Depressive Disorder (MDD), applying insights from a Bayesian reinforcement learning framework. We focus on anhedonia and helplessness. Helplessness--a core element in the conceptualizations of MDD that has lead to major advances in its treatment, pharmacological and neurobiological understanding--is formalized as a simple prior over the outcome entropy of actions in uncertain environments.