r/MachineLearning - [R] Provably Efficient Exploration in Policy Optimization