Faster and More Accurate Learning with Meta Trace Adaptation

Apr-25-2019–arXiv.org Artificial Intelligence

Assembling the compound targets is an open problem for achieving good learning performance in Reinforcement Learning (RL). TD(λ), which uses a single parameter controlled geometric sequence as the weights of the n-step returns, stands out from the sea of compound update methods for its efficient incremental updates and its interesting mathematical properties. Empirical studies show that different λ's yield different performance. Furthermore, it is expected that adapting λ appropriately during the learning boosts performance in terms of convergence speed and accuracy. The goal of this paper is to find a method that optimizes the overall target error for all the states. We first derive a new state meta-objective for optimizing the bias-variance tradeoff and show that the meta-objective proposed in an existing work [1] is actually a special case of the newly proposed objective. Then, we propose a trust-region style method to tackle the difficulties of optimizing the meta-objective and prove its equivalence to optimizing the overall target error, given appropriate assumptions. In experiments, we observe that the proposed method MTA has generally significantly improved empirical performance over the existing method and baselines.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Apr-25-2019

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Reinforcement Learning (0.90)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found