Variance-Reduced Conservative Policy Iteration

Agarwal, Naman, Bullins, Brian, Singh, Karan

Jan-25-2023–arXiv.org Artificial Intelligence

We study the sample complexity of reducing reinforcement learning to a sequence of empirical risk minimization problems over the policy space. Such reductions-based algorithms exhibit local convergence in the function space, as opposed to the parameter space for policy gradient algorithms, and thus are unaffected by the possibly non-linear or discontinuous parameterization of the policy class. We propose a variance-reduced variant of Conservative Policy Iteration that improves the sample complexity of producing a $\varepsilon$-functional local optimum from $O(\varepsilon^{-4})$ to $O(\varepsilon^{-3})$. Under state-coverage and policy-completeness assumptions, the algorithm enjoys $\varepsilon$-global optimality after sampling $O(\varepsilon^{-2})$ times, improving upon the previously established $O(\varepsilon^{-3})$ sample requirement.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

Jan-25-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.63)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning (0.46)
  - Representation & Reasoning > Optimization (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found