Optimal Completion Distillation for Sequence Learning