Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets

Open in new window