Explaining Fast Improvement in Online Policy Optimization