Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games

Open in new window