Reward-Weighted Regression Converges to a Global Optimum

Open in new window