Analytical Mean Squared Error Curves in Temporal Difference Learning

Dec-31-1997–Neural Information Processing Systems

We have calculated analytical expressions for how the bias and variance of the estimators provided by various temporal difference value estimation algorithms change with offline updates over trials in absorbing Markov chains using lookup table representations. We illustrate classes of learning curve behavior in various chains, and show the manner in which TD is sensitive to the choice of its stepsize andeligibility trace parameters. 1 INTRODUCTION A reassuring theory of asymptotic convergence is available for many reinforcement learning (RL) algorithms. What is not available, however, is a theory that explains the finite-term learning curve behavior of RL algorithms, e.g., what are the different kinds of learning curves, what are their key determinants, and how do different problem parameters effect rate of convergence. Answering these questions is crucial not only for making useful comparisons between algorithms, but also for developing hybrid and new RL methods. In this paper we provide preliminary answers to some of the above questions for the case of absorbing Markov chains, where mean square error between the estimated and true predictions is used as the quantity of interest in learning curves.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Dec-31-1997

Conferences PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County
    - Cambridge (0.14)
  - Colorado > Boulder County
    - Boulder (0.14)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
Analytical Mean Squared Error Curves in Temporal Difference Learning
Analytical Mean Squared Error Curves in Temporal Difference Learning

Similar Docs Excel Report more

Title	Similarity	Source
None found