Multimodal Alignment with Cross-Attentive GRUs for Fine-Grained Video Understanding

Open in new window