Glance and Focus: Memory Prompting for Multi-Event Video Question Answering Ziyi Bai
–Neural Information Processing Systems
Video Question Answering (VideoQA) has emerged as a vital tool to evaluate agents' ability to understand human daily behaviors. Despite the recent success of large vision language models in many multi-modal tasks, complex situation reasoning over videos involving multiple human-object interaction events still remains challenging.
Neural Information Processing Systems
Feb-13-2026, 14:35:42 GMT
- Industry:
- Education (0.68)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.46)
- Natural Language > Question Answering (0.63)
- Vision (1.00)
- Information Technology > Artificial Intelligence