Goto

Collaborating Authors

 question answering







Jungo Kasai Keisuke Sakaguchi Y oichi T akahashi Ronan Le Bras Akari Asai

Neural Information Processing Systems

Why was the dataset created? Has the dataset been used already? QA dataset has already been used. QA establishes a framework to benchmark question answering at the present time: answers (e.g., the number of Shohei Ohtani's home runs) change in real time. This could also include the system's interactions with its information retrieval module (for How many instances are there?



Glance and Focus: Memory Prompting for Multi-Event Video Question Answering Supplementary Material Ziyi Bai, Ruiping Wang, Xilin Chen ziyi.bai@vipl.ict.ac.cn, {wangruiping, xlchen }@ict.ac.cn

Neural Information Processing Systems

As mentioned in Section 4.2 Our model can easily adapt to various video backbones. We use QA accuracy as the metric for evaluation. As illustrated in Section 3.2, with event-level annotations, we First, we analyze the effects of different loss functions on model performance. The results are shown in Figure 1. When the coefficient of any loss function is 0, the performance of the model decreases, which indicates their efficiency in event memory extraction. Without it, there is a significant decrease in model performance.


Glance and Focus: Memory Prompting for Multi-Event Video Question Answering Ziyi Bai

Neural Information Processing Systems

Video Question Answering (VideoQA) has emerged as a vital tool to evaluate agents' ability to understand human daily behaviors. Despite the recent success of large vision language models in many multi-modal tasks, complex situation reasoning over videos involving multiple human-object interaction events still remains challenging.