Natural Policy Gradient as Doubly Smoothed Policy Iteration: A Bellman-Operator Framework

Open in new window