Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

Ji, Xiang, Chen, Minshuo, Wang, Mengdi, Zhao, Tuo

Oct-3-2022–arXiv.org Artificial Intelligence

We consider the off-policy evaluation problem of reinforcement learning using deep convolutional neural networks. We analyze the deep fitted Q-evaluation method for estimating the expected cumulative reward of a target policy, when the data are generated from an unknown behavior policy. We show that, by choosing network size appropriately, one can leverage any low-dimensional manifold structure in the Markov decision process and obtain a sample-efficient estimator without suffering from the curse of high data ambient dimensionality. Specifically, we establish a sharp error bound for fitted Q-evaluation, which depends on the intrinsic dimension of the state-action space, the smoothness of Bellman operator, and a function class-restricted $\chi^2$-divergence. It is noteworthy that the restricted $\chi^2$-divergence measures the behavior and target policies' {\it mismatch in the function space}, which can be small even if the two policies are not close to each other in their tabular forms. We also develop a novel approximation result for convolutional neural networks in Q-function estimation. Numerical experiments are provided to support our theoretical analysis.

approximation, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

Oct-3-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - Florida > Broward County
    - Fort Lauderdale (0.04)
  - California > San Francisco County
    - San Francisco (0.14)
- Europe > Switzerland
  - Basel-City > Basel (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Neural Networks > Deep Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found