Reward-Weighted Regression Converges to a Global Optimum