Improved Policy Optimization for Online Imitation Learning