Agnostic Q -learning with Function Approximation in Deterministic Systems: Near-Optimal Bounds on Approximation Error and Sample Complexity

Jan-16-2025, 22:58:40 GMT–Neural Information Processing Systems

The current paper studies the problem of agnostic Q -learning with function approximation in deterministic systems where the optimal Q -function is approximable by a function in the class \mathcal{F} with approximation error \delta \ge 0 . We propose a novel recursion-based algorithm and show that if \delta O\left(\rho/\sqrt{\dim_E}\right), then one can find the optimal policy using O(\dim_E) trajectories, where \rho is the gap between the optimal Q -value of the best actions and that of the second-best actions and \dim_E is the Eluder dimension of \mathcal{F} . Our result has two implications: \begin{enumerate} \item In conjunction with the lower bound in [Du et al., 2020], our upper bound suggests that the condition \delta \widetilde{\Theta}\left(\rho/\sqrt{\dim_E}\right) is necessary and sufficient for algorithms with polynomial sample complexity. We further extend our algorithm to the stochastic reward setting and obtain similar results.

approximation error and sample complexity, deterministic system, function approximation, (7 more...)

Neural Information Processing Systems

Jan-16-2025, 22:58:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (1.00)
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.64)