CORDIAL: Can Multimodal Large Language Models Effectively Understand Coherence Relationships?