Uncertainty-based Offline Variational Bayesian Reinforcement Learning for Robustness under Diverse Data Corruptions Rui Y ang

Neural Information Processing Systems 

These Q values are often objectives that offline algorithms aim to approximate.