Faster and More Accurate Learning with Meta Trace Adaptation
–arXiv.org Artificial Intelligence
Assembling the compound targets is an open problem for achieving good learning performance in Reinforcement Learning (RL). TD(λ), which uses a single parameter controlled geometric sequence as the weights of the n-step returns, stands out from the sea of compound update methods for its efficient incremental updates and its interesting mathematical properties. Empirical studies show that different λ's yield different performance. Furthermore, it is expected that adapting λ appropriately during the learning boosts performance in terms of convergence speed and accuracy. The goal of this paper is to find a method that optimizes the overall target error for all the states. We first derive a new state meta-objective for optimizing the bias-variance tradeoff and show that the meta-objective proposed in an existing work [1] is actually a special case of the newly proposed objective. Then, we propose a trust-region style method to tackle the difficulties of optimizing the meta-objective and prove its equivalence to optimizing the overall target error, given appropriate assumptions. In experiments, we observe that the proposed method MTA has generally significantly improved empirical performance over the existing method and baselines.
arXiv.org Artificial Intelligence
Apr-25-2019