Tractable and Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation

Cho, Taehyun, Han, Seungyub, Lee, Kyungjae, Ju, Seokhun, Kim, Dohyeong, Lee, Jungwoo

Jul-30-2024–arXiv.org Machine Learning

Distributional reinforcement learning improves performance by effectively capturing environmental stochasticity, but a comprehensive theoretical understanding of its effectiveness remains elusive. In this paper, we present a regret analysis for distributional reinforcement learning with general value function approximation in a finite episodic Markov decision process setting. We first introduce a key notion of Bellman unbiasedness for a tractable and exactly learnable update via statistical functional dynamic programming. Our theoretical results show that approximating the infinite-dimensional return distribution with a finite number of moment functionals is the only method to learn the statistical information unbiasedly, including nonlinear statistical functionals. Second, we propose a provably efficient algorithm, $\texttt{SF-LSVI}$, achieving a regret bound of $\tilde{O}(d_E H^{\frac{3}{2}}\sqrt{K})$ where $H$ is the horizon, $K$ is the number of episodes, and $d_E$ is the eluder dimension of a function class.

reinforcement, reinforcement learning, sketch, (10 more...)

arXiv.org Machine Learning

Jul-30-2024

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England
    - Greater London > London (0.04)
    - Cambridgeshire > Cambridge (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - South Korea > Seoul
    - Seoul (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.61)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found