AITopics

Stress and genetic background regulate different aspects of behavioral learning through the action of stress hormones and neuromodulators.

experiment, future reward discount factor, mice, (14 more...)

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Ghavamzadeh, Mohammad, Engel, Yaakov

Bayesian Policy Gradient Algorithms

Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian framework that models the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates are provided at little extra cost.

algorithm, fisher information matrix, gradient, (15 more...)

Country:

North America > United States > California > Alameda County > Hayward (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Geramifard, Alborz, Bowling, Michael, Zinkevich, Martin, Sutton, Richard S.

iLSTD: Eligibility Traces and Convergence Analysis

In this paper, we generalize the previous iLSTD algorithm and present three new results: (1) the first convergence proof for an iLSTD algorithm; (2) an extension to incorporate eligibility traces without changing the asymptotic computational complexity; and (3) the first empirical results with an iLSTD algorithm for a problem (mountain car) with feature vectors large enough (n 10, 000) to show substantial computational advantages over LSTD.

algorithm, ilstd, selection mechanism, (13 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Auer, Peter, Ortner, Ronald

Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning

We present a learning algorithm for undiscounted reinforcement learning. Our interest lies in bounds for the algorithm's online performance after some finite number of steps. In the spirit of similar methods already successfully applied for the exploration-exploitation tradeoff in multi-armed bandit problems, we use upper confidence bounds to show that our UCRL algorithm achieves logarithmic online regret in the number of steps taken with respect to an optimal policy.

algorithm, probability, transition probability, (12 more...)

Country: Europe > Austria > Styria > Leoben (0.04)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Abbeel, Pieter, Coates, Adam, Quigley, Morgan, Ng, Andrew Y.

An Application of Reinforcement Learning to Aerobatic Helicopter Flight

Autonomous helicopter flight is widely regarded to be a highly challenging control problem. This paper presents the first successful autonomous completion on a real RC helicopter of the following four aerobatic maneuvers: forward flip and sideways roll at low speed, tail-in funnel, and nose-in funnel. Our experimental results significantly extend the state of the art in autonomous helicopter flight. We used the following approach: First we had a pilot fly the helicopter to help us find a helicopter dynamics model and a reward (cost) function. Then we used a reinforcement learning (optimal control) algorithm to find a controller that is optimized for the resulting model and reward function. More specifically, we used differential dynamic programming (DDP), an extension of the linear quadratic regulator (LQR).

controller, funnel, helicopter, (17 more...)

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Industry:

Transportation > Air (1.00)
Aerospace & Defense > Aircraft (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ghavamzadeh, Mohammad, Engel, Yaakov

Bayesian Policy Gradient Algorithms

Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policyby following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniques to estimate this gradient. Since Monte Carlo methods tend to have high variance, a large number of samples is required, resulting in slow convergence. In this paper, we propose a Bayesian framework that models the policy gradient as a Gaussian process. This reduces the number of samples needed to obtain accurate gradient estimates. Moreover, estimates of the natural gradient as well as a measure of the uncertainty in the gradient estimates are provided at little extra cost.

algorithm, fisher information matrix, gradient, (15 more...)

Country:

North America > United States > California > Alameda County > Hayward (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > Middle East > Israel (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Geramifard, Alborz, Bowling, Michael, Zinkevich, Martin, Sutton, Richard S.

iLSTD: Eligibility Traces and Convergence Analysis

In this paper, we generalize the previous iLSTD algorithm and present three new results: (1)the first convergence proof for an iLSTD algorithm; (2) an extension to incorporate eligibility traces without changing the asymptotic computational complexity; and(3) the first empirical results with an iLSTD algorithm for a problem (mountain car) with feature vectors large enough (n 10, 000) to show substantial computationaladvantages over LSTD.

ilstd, machine learning, reinforcement learning, (16 more...)

Country:

North America > Canada > Alberta (0.29)
North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Lukšys, Gediminas, Knüsel, Jérémie, Sheynikhovich, Denis, Sandi, Carmen, Gerstner, Wulfram

Effects of Stress and Genotype on Meta-parameter Dynamics in Reinforcement Learning

Stress and genetic background regulate different aspects of behavioral learning through the action of stress hormones and neuromodulators.

experiment, machine learning, reinforcement learning, (15 more...)

Country: Europe (0.28)

Genre: Research Report (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Richter, Silvia, Aberdeen, Douglas, Yu, Jin

Natural Actor-Critic for Road Traffic Optimisation

Current road-traffic optimisation practice around the world is a combination of hand tuned policies with a small degree of automatic adaption. Even state-ofthe-art researchcontrollers need good models of the road traffic, which cannot be obtained directly from existing sensors. We use a policy-gradient reinforcement learningapproach to directly optimise the traffic signals, mapping currently deployed sensor observations to control signals. Our trained controllers are (theoretically) compatiblewith the traffic system used in Sydney and many other cities around the world. We apply two policy-gradient methods: (1) the recent natural actor-critic algorithm, and (2) a vanilla policy-gradient algorithm for comparison. Along the way we extend natural-actor critic approaches to work for distributed and online infinite-horizon problems.

artificial intelligence, machine learning, reinforcement learning, (19 more...)