Glance and Focus: Memory Prompting for Multi-Event Video Question Answering Supplementary Material Ziyi Bai, Ruiping Wang, Xilin Chen ziyi.bai@vipl.ict.ac.cn, {wangruiping, xlchen }@ict.ac.cn

Neural Information Processing Systems 

As mentioned in Section 4.2 Our model can easily adapt to various video backbones. We use QA accuracy as the metric for evaluation. As illustrated in Section 3.2, with event-level annotations, we First, we analyze the effects of different loss functions on model performance. The results are shown in Figure 1. When the coefficient of any loss function is 0, the performance of the model decreases, which indicates their efficiency in event memory extraction. Without it, there is a significant decrease in model performance.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found