Pessimistic Value Iteration for Multi-Task Data Sharing in Offline Reinforcement Learning

Open in new window