Improving Temporal Understanding Logic Consistency in Video-Language Models via Attention Enhancement

Open in new window