Don't Judge by the Look: Towards Motion Coherent Video Representation