n-Step Temporal Difference Learning with Optimal n

Apr-14-2023–arXiv.org Artificial Intelligence

We consider the problem of finding the optimal value of n in the n-step temporal difference (TD) learning algorithm. We find the optimal n by resorting to a model-free optimization technique involving a one-simulation simultaneous perturbation stochastic approximation (SPSA) based procedure that we adopt to the discrete optimization setting by using a random projection approach. We prove the convergence of our proposed algorithm, SDPSA, using a differential inclusions approach and show that it finds the optimal value of n in n-step TD. Through experiments, we show that the optimal value of n is achieved with SDPSA for arbitrary initial values. I. INTRODUCTION Reinforcement learning (RL) algorithms are widely used for solving problems of sequential decisionmaking under uncertainty. An RL agent typically makes decisions based on data that it collects through interactions with the environment in order to maximize a certain long-term reward [1], [2].

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

Apr-14-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > India
  - Karnataka > Bengaluru (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found