On-Policy Robot Imitation Learning from a Converging Supervisor