SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning
–Neural Information Processing Systems
In order to overcome overestimation bias, ensemble methods for Q-learning have been investigated to exploit the diversity of multiple Q-functions. Since network initialization has been the predominant approach to promote diversity in Q-functions, heuristically designed diversity injection methods have been studied in the literature. However, previous studies have not attempted to approach guaranteed independence over an ensemble from a theoretical perspective.
Neural Information Processing Systems
Feb-17-2026, 04:23:54 GMT
- Country:
- Asia > South Korea
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- California > Alameda County > Berkeley (0.04)
- Technology: