Near Optimal Policy Optimization via REPS