Goto

Collaborating Authors

 q-function


Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee

Neural Information Processing Systems

We propose stochastic ensemble value expansion (STEVE), a novel model-based technique that addresses this issue. By dynamically interpolating between model rollouts of various horizon lengths for each individual example, STEVE ensures that the model is only utilized when doing so does not introduce significant errors.






Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Neural Information Processing Systems

However, existing offline RL methods tend to behave poorly during fine-tuning. In this paper, we study the fine-tuning problem in the context of conservative offline RL methods and we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning capabilities.