Improve Temporal Reasoning in Multimodal Large Language Models via Video Contrastive Decoding

Open in new window