The MSR-Video to Text Dataset with Clean Annotations