Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy

Open in new window