Exploring TD error as a heuristic for $\sigma$ selection in Q($\sigma$, $\lambda$)

Dec-21-2019–arXiv.org Machine Learning

In the landscape of TD algorithms, the Q( σ,λ) algorithm is an algorithm with the ability to perform a multi-step backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. Selecting the value of σ can be based on characteristics of the current state rather than having a constant value or being time based. This project explores the viability of such a TD-error based scheme. Introduction While having different dimensions of generalizability in an algorithm can serve as a powerful tool, in most cases it comes with the associated burden of having to manually select values along these dimensions, commonly referred to as hyper-parameter selection. In case of learning algorithms, an ideal algorithm would be completely general, even to the point that they do not need a fixed set of hyper-parameters for which they perform optimally for a given problem. In the context of Q( σ,λ), the introduction of the σ parameter gives us flexibility in terms of adjusting the proportion of sampling and expectation we want in our updates. But at the same time, while σ does serve as a hyper-parameter, atypically a constant value of σ was found to not have the best performance by De Asis, Hernandez-Garcia, Holland and Sutton (2018). They used a Dynamic Decay σ scheme for n-step Q( σ) where they reduced the value of σ after every episode by a factor of 0.95.

algorithm, experiment, td error, (14 more...)

arXiv.org Machine Learning

Dec-21-2019

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Alberta (0.14)

Genre:
- Research Report
  - New Finding (0.68)
  - Experimental Study (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.95)
  - Machine Learning > Reinforcement Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found