On the role of overparameterization in off-policy Temporal Difference learning with linear function approximation
–Neural Information Processing Systems
Much of the recent successes of deep learning can be attributed to scaling up the size of the networks to the point where they often are vastly overparameterized. Thus, understanding the role of overparameterization is of increasing importance. While predictive theories have been developed for supervised learning, little is known about the Reinforcement Learning case. In this work, we take a theoretical approach and study the role of overparameterization for off-policy Temporal Difference (TD) learning in the linear setting. We leverage tools from random matrix theory and random graph theory to obtain a characterization of the spectrum of the TD operator. We use this result to study the stability and optimization dynamics of TD learning as a function of the number of parameters.
Neural Information Processing Systems
Nov-17-2025, 14:39:16 GMT
- Country:
- Africa > Middle East
- Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Asia > Russia (0.04)
- Europe
- Russia (0.04)
- United Kingdom > England
- Oxfordshire > Oxford (0.04)
- North America
- Canada > British Columbia
- Vancouver (0.04)
- United States > New Jersey
- Mercer County > Princeton (0.04)
- Canada > British Columbia
- Africa > Middle East
- Genre:
- Research Report (0.46)
- Technology: