Rethinking the Global Convergence of Softmax Policy Gradient with Linear Function Approximation

Open in new window