A Finite Sample Complexity Bound for Distributionally Robust Q-learning

Wang, Shengbo, Si, Nian, Blanchet, Jose, Zhou, Zhengyuan

Mar-2-2023–arXiv.org Artificial Intelligence

We consider a reinforcement learning setting in which the deployment environment is different from the training environment. Applying a robust Markov decision processes formulation, we extend the distributionally robust $Q$-learning framework studied in Liu et al. [2022]. Further, we improve the design and analysis of their multi-level Monte Carlo estimator. Assuming access to a simulator, we prove that the worst-case expected sample complexity of our algorithm to learn the optimal robust $Q$-function within an $\epsilon$ error in the sup norm is upper bounded by $\tilde O(|S||A|(1-\gamma)^{-5}\epsilon^{-2}p_{\wedge}^{-6}\delta^{-4})$, where $\gamma$ is the discount rate, $p_{\wedge}$ is the non-zero minimal support probability of the transition kernels and $\delta$ is the uncertainty size. This is the first sample complexity result for the model-free robust RL problem. Simulation studies further validate our theoretical results.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

Mar-2-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - California (0.04)
  - Pennsylvania > Philadelphia County
    - Philadelphia (0.04)
  - Massachusetts > Middlesex County
    - Belmont (0.04)
  - Illinois > Cook County
    - Chicago (0.04)

Genre:
- Research Report (0.64)

Industry:
- Leisure & Entertainment (0.45)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found