Diverse Randomized Value Functions: A Provably Pessimistic Approach for Offline Reinforcement Learning

Open in new window