AITopics

Petrik, Marek, Taylor, Gavin, Parr, Ron, Zilberstein, Shlomo

Feature Selection Using Regularization in Approximate Linear Programs for Markov Decision Processes

arXiv.org Artificial IntelligenceMay-20-2010

Approximate dynamic programming has been used successfully in a large variety of domains, but it relies on a small set of provided approximation features to calculate solutions reliably. Large and rich sets of features can cause existing algorithms to overfit because of a limited number of samples. We address this shortcoming using $L_1$ regularization in approximate linear programming. Because the proposed method can automatically select the appropriate richness of features, its performance does not degrade with an increasing number of features. These results rely on new and stronger sampling bounds for regularized approximate linear programs. We also propose a computationally efficient homotopy method. The empirical evaluation of the approach shows that the proposed method performs well on simple MDPs and standard benchmark problems.

constraint, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1005.186

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > North Carolina > Durham County > Durham (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.41)

Di Castro, Dotan, Mannor, Shie

Adaptive Bases for Reinforcement Learning

arXiv.org Artificial IntelligenceMay-2-2010

We consider the problem of reinforcement learning using function approximation, where the approximating basis can change dynamically while interacting with the environment. A motivation for such an approach is maximizing the value function fitness to the problem faced. Three errors are considered: approximation square error, Bellman residual, and projected Bellman residual. Algorithms under the actor-critic framework are presented, and shown to converge. The advantage of such an adaptive basis is demonstrated in simulations.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

1005.0125

Country:

North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Alberta (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ortega, Pedro A., Braun, Daniel A.

A Minimum Relative Entropy Principle for Learning and Acting

arXiv.org Artificial IntelligenceApr-10-2010

This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is an agent that has been designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the well-known Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert.

machine learning, operation mode, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

0810.3605

Country:

North America > United States (0.67)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.46)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Sandi, Carmen, Gerstner, Wulfram, Lukšys, Gediminas

Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning

Suppose we train an animal in a conditioning experiment. Can one predict how a given animal, under given experimental conditions, would perform the task? Since various factors such as stress, motivation, genetic background, and previous errors in task performance can influence animal behaviour, this appears to be a very challenging aim. Reinforcement learning (RL) models have been successful in modeling animal (and human) behaviour, but their success has been limited because of uncertainty as to how to set meta-parameters (such as learning rate, exploitation-exploration balance and future reward discount factor) that strongly influence model performance. We show that a simple RL model whose metaparameters are controlled by an artificial neural network, fed with inputs such as stress, affective phenotype, previous task performance, and even neuromodulatory manipulations, can successfully predict mouse behaviour in the "hole-box" - a simple conditioning task. Our results also provide important insights on how stress and anxiety affect animal learning, performance accuracy, and discounting of future rewards, and on how noradrenergic systems can interact with these processes.

animal behaviour, mice, noradrenaline, (14 more...)

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (0.50)
Research Report > Experimental Study (0.49)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Kolodziejski, Christoph, Porr, Bernd, Tamosiunaite, Minija, Wörgötter, Florentin

On the asymptotic equivalence between differential Hebbian and temporal difference learning using a local third factor

In this theoretical contribution we provide mathematical proof that two of the most important classes of network learning - correlation-based differential Hebbian learning and reward-based temporal difference learning - are asymptotically equivalent when timing the learning with a local modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning framework from a correlation based perspective that is more closely related to the biophysics of neurons.

differential hebbian, hebbian, temporal difference, (16 more...)

Country:

Europe > Germany > Lower Saxony > Gottingen (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Huys, Quentin J., Vogelstein, Joshua, Dayan, Peter

Psychiatry: Insights into depression through normative decision-making models

Decision making lies at the very heart of many psychiatric diseases. It is also a central theoretical concern in a wide variety of fields and has undergone detailed, in-depth, analyses. We take as an example Major Depressive Disorder (MDD), applying insights from a Bayesian reinforcement learning framework. We focus on anhedonia and helplessness. Helplessness--a core element in the conceptualizations of MDD that has lead to major advances in its treatment, pharmacological and neurobiological understanding--is formalized as a simple prior over the outcome entropy of actions in uncertain environments.

depression, outcome distribution, slot machine, (15 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Sandi, Carmen, Gerstner, Wulfram, Lukšys, Gediminas

Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning

Suppose we train an animal in a conditioning experiment. Can one predict how a given animal, under given experimental conditions, would perform the task? Since various factors such as stress, motivation, genetic background, and previous errors in task performance can influence animal behaviour, this appears to be a very challenging aim. Reinforcement learning (RL) models have been successful in modeling animal (and human) behaviour, but their success has been limited because of uncertainty as to how to set meta-parameters (such as learning rate, exploitation-exploration balance and future reward discount factor) that strongly influence model performance. We show that a simple RL model whose metaparameters are controlled by an artificial neural network, fed with inputs such as stress, affective phenotype, previous task performance, and even neuromodulatory manipulations, can successfully predict mouse behaviour in the "hole-box" - a simple conditioning task. Our results also provide important insights on how stress and anxiety affect animal learning, performance accuracy, and discounting of future rewards, and on how noradrenergic systems can interact with these processes.

animal behaviour, mice, noradrenaline, (14 more...)

Country:

Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Research Report > New Finding (0.50)
Research Report > Experimental Study (0.49)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Kolodziejski, Christoph, Porr, Bernd, Tamosiunaite, Minija, Wörgötter, Florentin

On the asymptotic equivalence between differential Hebbian and temporal difference learning using a local third factor

In this theoretical contribution we provide mathematical proof that two of the most important classes of network learning - correlation-based differential Hebbian learning and reward-based temporal difference learning - are asymptotically equivalent when timing the learning with a local modulatory signal. This opens the opportunity to consistently reformulate most of the abstract reinforcement learning framework from a correlation based perspective that is more closely related to the biophysics of neurons.

differential hebbian, hebbian, temporal difference, (16 more...)

Country:

Europe > Germany > Lower Saxony > Gottingen (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Huys, Quentin J., Vogelstein, Joshua, Dayan, Peter

Psychiatry: Insights into depression through normative decision-making models

Decision making lies at the very heart of many psychiatric diseases. It is also a central theoretical concern in a wide variety of fields and has undergone detailed, in-depth, analyses. We take as an example Major Depressive Disorder (MDD), applying insights from a Bayesian reinforcement learning framework. We focus on anhedonia and helplessness. Helplessness--a core element in the conceptualizations of MDD that has lead to major advances in its treatment, pharmacological and neurobiological understanding--is formalized as a simple prior over the outcome entropy of actions in uncertain environments.

depression, outcome distribution, slot machine, (15 more...)

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)