One-Shot Averaging for Distributed TD($\lambda$) Under Markov Sampling

Tian, Haoxing, Paschalidis, Ioannis Ch., Olshevsky, Alex

May-31-2024–arXiv.org Artificial Intelligence

Actor-critic method achieves state-of-the-art performance in many domains including robotics, game playing, and control systems (LeCun et al. (2015); Mnih et al. (2016); Silver et al. (2017)). Temporal Difference (TD) Learning may be thought of as a component of actor critic, and better bounds for TD Learning are usually ingredients of actor-critic analyses. We consider the problem of policy evaluation in reinforcement learning: given a Markov Decision Process (MDP) and a policy, we need to estimate the value of each state (expected discounted sum of all future rewards) under this policy. Policy evaluation is important because it is effectively a subroutine of many other algorithms such as policy iteration and actor-critic. The main challenges for policy evaluation are that we usually do not know the underlying MDP directly and can only interact with it, and that the number of states is typically too large forcing us to maintain a low-dimensional approximation of the true vector of state values.

machine learning, one-shot averaging, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

May-31-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - Massachusetts > Suffolk County
    - Boston (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found