AITopics

doi: 10.1613/jair.898

AI Access Foundation

10348

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York (0.04)
(17 more...)

Industry:

Education (0.46)
Energy (0.46)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Journal of Artificial Intelligence ResearchSep-1-2003

Potential-Based Shaping and Q-Value Initialization are Equivalent

Wiewiora, E.

Shaping has proven to be a powerful but precarious means of improving reinforcement learning performance. Ng, Harada, and Russell (1999) proposed the potential-based shaping algorithm for adding shaping rewards in a way that guarantees the learner will learn optimal behavior. In this note, we prove certain similarities between this shaping algorithm and the initialization step required for several reinforcement learning algorithms. More specifically, we prove that a reinforcement learner with initial Q-values based on the shaping algorithm's potential function make the same updates throughout learning as a learner receiving potential-based shaping rewards. We further prove that under a broad category of policies, the behavior of these two learners are indistinguishable. The comparison provides intuition on the theoretical properties of the shaping algorithm as well as a suggestion for a simpler method for capturing the algorithm's benefit. In addition, the equivalence raises previously unaddressed issues concerning the efficiency of learning with potential-based shaping.

learner, q-value, reinforcement, (10 more...)

doi: 10.1613/jair.1190

AI Access Foundation

10338

Country:

North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > California > San Diego County > La Jolla (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Brafman, R. I., Tennenholtz, M.

Learning to Coordinate Efficiently: A Model-based Approach

Journal of Artificial Intelligence ResearchJul-1-2003

In common-interest stochastic games all players receive an identical payoff. Players participating in such games must learn to coordinate with each other in order to receive the highest-possible value. A number of reinforcement learning algorithms have been proposed for this problem, and some have been shown to converge to good solutions in the limit. In this paper we show that using very simple model-based algorithms, much better (i.e., polynomial) convergence rates can be attained. Moreover, our model-based algorithms are guaranteed to converge to the optimal value, unlike many of the existing algorithms.

coordinate efficiently, iirmr, model-based approach, (3 more...)

doi: 10.1613/jair.1154

AI Access Foundation

10333

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.40)

Kositsky, Michael, Barto, Andrew G.

The Emergence of Multiple Movement Units in the Presence of Noise and Feedback Delay

Tangential hand velocity profiles of rapid human arm movements often appearas sequences of several bell-shaped acceleration-deceleration phases called submovements or movement units. This suggests how the nervous system might efficiently control a motor plant in the presence of noise and feedback delay. Another critical observation is that stochasticity ina motor control problem makes the optimal control policy essentially differentfrom the optimal control policy for the deterministic case. We use a simplified dynamic model of an arm and address rapid aimed arm movements. We use reinforcement learning as a tool to approximate the optimal policy in the presence of noise and feedback delay. Using a simplified model we show that multiple submovements emerge as an optimal policy in the presence of noise and feedback delay. The optimal policy in this situation is to drive the arm's end point close to the target by one fast submovement and then apply a few slow submovements to accurately drivethe arm's end point into the target region. In our simulations, the controller sometimes generates corrective submovements before the initial fast submovement is completed, much like the predictive corrections observedin a number of psychophysical experiments.

controller, neurology, upstream oil & gas, (21 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Asia (0.14)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.50)
Energy > Oil & Gas > Upstream (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Kositsky, Michael, Barto, Andrew G.

The Emergence of Multiple Movement Units in the Presence of Noise and Feedback Delay

Tangential hand velocity profiles of rapid human arm movements often appear as sequences of several bell-shaped acceleration-deceleration phases called submovements or movement units. This suggests how the nervous system might efficiently control a motor plant in the presence of noise and feedback delay. Another critical observation is that stochasticity in a motor control problem makes the optimal control policy essentially different from the optimal control policy for the deterministic case. We use a simplified dynamic model of an arm and address rapid aimed arm movements. We use reinforcement learning as a tool to approximate the optimal policy in the presence of noise and feedback delay. Using a simplified model we show that multiple submovements emerge as an optimal policy in the presence of noise and feedback delay. The optimal policy in this situation is to drive the arm's end point close to the target by one fast submovement and then apply a few slow submovements to accurately drive the arm's end point into the target region. In our simulations, the controller sometimes generates corrective submovements before the initial fast submovement is completed, much like the predictive corrections observed in a number of psychophysical experiments.

neurology, submovement, upstream oil & gas, (21 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Asia (0.14)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.68)
Energy > Oil & Gas > Upstream (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Kositsky, Michael, Barto, Andrew G.

The Emergence of Multiple Movement Units in the Presence of Noise and Feedback Delay

Tangential hand velocity profiles of rapid human arm movements often appear as sequences of several bell-shaped acceleration-deceleration phases called submovements or movement units. This suggests how the nervous system might efficiently control a motor plant in the presence of noise and feedback delay. Another critical observation is that stochasticity in a motor control problem makes the optimal control policy essentially different from the optimal control policy for the deterministic case. We use a simplified dynamic model of an arm and address rapid aimed arm movements. We use reinforcement learning as a tool to approximate the optimal policy in the presence of noise and feedback delay. Using a simplified model we show that multiple submovements emerge as an optimal policy in the presence of noise and feedback delay. The optimal policy in this situation is to drive the arm's end point close to the target by one fast submovement and then apply a few slow submovements to accurately drive the arm's end point into the target region. In our simulations, the controller sometimes generates corrective submovements before the initial fast submovement is completed, much like the predictive corrections observed in a number of psychophysical experiments.

neurology, submovement, upstream oil & gas, (21 more...)

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Asia (0.14)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.68)
Energy > Oil & Gas > Upstream (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Motivated Reinforcement Learning

Dayan, Peter

Competition between actions is based on the motivating characteristics of their consequent states in this sense.

conditioning, instrumental conditioning, motivation, (14 more...)

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.46)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Wang, Xin, Dietterich, Thomas G.

Stabilizing Value Function Approximation with the BFBP Algorithm

However, online RL algorithms such as SARSA(A) have been shown experimentally to have difficulty converging when applied with function approximators. Theoretical analysis has not been able to prove convergence, even in the case-of linear function approximators.

algorithm, bfbp, function approximator, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > United States > Oregon > Benton County > Corvallis (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)

Mannor, Shie, Shimkin, Nahum

The Steering Approach for Multi-Criteria Reinforcement Learning

We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm combines, in an appropriate way, a finite set of standard, scalar-reward learning algorithms. Sufficient conditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well.

algorithm, reward vector, vector, (14 more...)

Country: Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Lagoudakis, Michail G., Parr, Ronald

Model-Free Least-Squares Policy Iteration

We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. We are motivated by the least squares temporal difference learning algorithm (LSTD), which is known for its efficient use of sample experiences compared to pure temporal difference algorithms. LSTD is ideal for prediction problems, however it heretofore has not had a straightforward application to control problems. Moreover, approximations learned by LSTD are strongly influenced by the visitation distribution over states.

algorithm, approximation, iteration, (13 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)