Imitation Learning from a Single Temporally Misaligned Video