On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization
Gaur, Mudit, Aggarwal, Vaneet, Agarwal, Mridul
–arXiv.org Artificial Intelligence
Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood. In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parameterization, and find the sample complexity guarantees for the algorithm. Our approach estimates the Q-function in each iteration using a convex optimization problem. We show that this approach achieves a sample complexity of $\tilde{\mathcal{O}}(1/\epsilon^{2})$, which is order-optimal. This result holds for a countable state-spaces and does not require any assumptions such as a linear or low rank structure on the MDP.
arXiv.org Artificial Intelligence
Jan-30-2023
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- California > Santa Clara County
- Palo Alto (0.04)
- Indiana > Tippecanoe County
- Lafayette (0.04)
- West Lafayette (0.04)
- California > Santa Clara County
- Asia > Middle East
- Genre:
- Research Report (0.50)
- Industry:
- Energy (0.67)
- Leisure & Entertainment > Games (0.45)
- Technology: