Agnostic Q-learning with Function Approximation in Deterministic Systems: Tight Bounds on Approximation Error and Sample Complexity

Du, Simon S., Lee, Jason D., Mahajan, Gaurav, Wang, Ruosong

Feb-17-2020–arXiv.org Artificial Intelligence

The current paper studies the problem of agnostic $Q$-learning with function approximation in deterministic systems where the optimal $Q$-function is approximable by a function in the class $\mathcal{F}$ with approximation error $\delta \ge 0$. We propose a novel recursion-based algorithm and show that if $\delta = O\left(\rho/\sqrt{\dim_E}\right)$, then one can find the optimal policy using $O\left(\dim_E\right)$ trajectories, where $\rho$ is the gap between the optimal $Q$-value of the best actions and that of the second-best actions and $\dim_E$ is the Eluder dimension of $\mathcal{F}$. Our result has two implications: 1) In conjunction with the lower bound in [Du et al., ICLR 2020], our upper bound suggests that the condition $\delta = \widetilde{\Theta}\left(\rho/\sqrt{\mathrm{dim}_E}\right)$ is necessary and sufficient for algorithms with polynomial sample complexity. 2) In conjunction with the lower bound in [Wen and Van Roy, NIPS 2013], our upper bound suggests that the sample complexity $\widetilde{\Theta}\left(\mathrm{dim}_E\right)$ is tight even in the agnostic setting. Therefore, we settle the open problem on agnostic $Q$-learning proposed in [Wen and Van Roy, NIPS 2013]. We further extend our algorithm to the stochastic reward setting and obtain similar results.

algorithm, linear function, sample complexity, (13 more...)

arXiv.org Artificial Intelligence

Feb-17-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.04)
  - California > San Diego County
    - San Diego (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.62)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found