Achieving the Minimax Optimal Sample Complexity of Offline Reinforcement Learning: A DRO-Based Approach

Open in new window