Appendix for QVH IGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries

Aug-14-2025, 20:18:44 GMT–Neural Information Processing Systems

In Table 2, we show the effect of using different #moment queries. As can be seen from the table, this hyper-parameter has a large impact on moment retrieval task where a reasonably smaller value (e.g., 10) gives better performance. As described in main text Equation 3, Moment-DETR's saliency loss Table 3, we study the effect of using the two terms. We show more correct predictions and failure cases from our Moment-DETR model in Figure 1 and Figure 2. In Table 4, we show the distribution of annotated saliency scores. We noticed 94.41% of the annotated clips are rated by two or more users as'Fair' or better (i.e., >=3, To ensure data quality, we require workers to pass our qualification test before participating in our annotation task.

natural language query, query, qvh ighlight, (10 more...)

Neural Information Processing Systems

Aug-14-2025, 20:18:44 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > North Carolina (0.05)

Technology:
- Information Technology > Artificial Intelligence > Natural Language (0.42)

Duplicate Docs Excel Report

Title
62e0973455fd26eb03e91d5741a4a3bb-Supplemental.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found