Offline Minimax Soft-Q-learning Under Realizability and Partial Coverage

Open in new window