Glance and Focus: Memory Prompting for Multi-Event Video Question Answering Supplementary Material Ziyi Bai, Ruiping Wang, Xilin Chen ziyi.bai@vipl.ict.ac.cn, {wangruiping, xlchen }@ict.ac.cn

Feb-13-2026, 14:35:46 GMT–Neural Information Processing Systems

As mentioned in Section 4.2 Our model can easily adapt to various video backbones. We use QA accuracy as the metric for evaluation. As illustrated in Section 3.2, with event-level annotations, we First, we analyze the effects of different loss functions on model performance. The results are shown in Figure 1. When the coefficient of any loss function is 0, the performance of the model decreases, which indicates their efficiency in event memory extraction. Without it, there is a significant decrease in model performance.

machine learning, natural language, question answering, (15 more...)

Neural Information Processing Systems

Feb-13-2026, 14:35:46 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Vision (0.97)
  - Natural Language > Question Answering (0.43)

Duplicate Docs Excel Report

Title
6baec7c4ba0a8734ccbd528a8090cb1f-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found