Goto

Collaborating Authors

 q-value






SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning

Neural Information Processing Systems

In order to overcome overestimation bias, ensemble methods for Q-learning have been investigated to exploit the diversity of multiple Q-functions. Since network initialization has been the predominant approach to promote diversity in Q-functions, heuristically designed diversity injection methods have been studied in the literature. However, previous studies have not attempted to approach guaranteed independence over an ensemble from a theoretical perspective.


Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Neural Information Processing Systems

However, existing offline RL methods tend to behave poorly during fine-tuning. In this paper, we study the fine-tuning problem in the context of conservative offline RL methods and we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities.