Generalized Emphatic Temporal Difference Learning: Bias-Variance Analysis

Hallak, Assaf (Technion Institute of Technology) | Tamar, Aviv (University of California, Berkeley) | Munos, Remi (Google DeepMind) | Mannor, Shie (Technion Institute of Technology)

Apr-19-2016–AAAI Conferences

We consider the off-policy evaluation problem in Markov decision processes with function approximation. We propose a generalization of the recently introduced emphatic temporal differences (ETD) algorithm, which encompasses the original ETD(λ), as well as several other off-policy evaluation algorithms as special cases. We call this framework ETD(λ, β), where our introduced parameter β controls the decay rate of an importance-sampling term. We study conditions under which the projected fixed-point equation underlying ETD(λ, β) involves a contraction operator, allowing us to present the first asymptotic error bounds (bias) for ETD(λ, β). Our results show that the original ETD algorithm always involves a contraction operator, and its bias is bounded. Moreover, by controlling β, our proposed generalization allows trading-off bias for variance reduction, thereby achieving a lower total error.

algorithm, artificial intelligence, reinforcement learning, (20 more...)

AAAI Conferences

Apr-19-2016

Conferences PDF

Add feedback

Genre:
- Research Report > New Finding (0.54)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found