Appendix for QVH IGHLIGHTS: Detecting Moments and Highlights in Videos via Natural Language Queries
–Neural Information Processing Systems
In Table 2, we show the effect of using different #moment queries. As can be seen from the table, this hyper-parameter has a large impact on moment retrieval task where a reasonably smaller value (e.g., 10) gives better performance. As described in main text Equation 3, Moment-DETR's saliency loss Table 3, we study the effect of using the two terms. We show more correct predictions and failure cases from our Moment-DETR model in Figure 1 and Figure 2. In Table 4, we show the distribution of annotated saliency scores. We noticed 94.41% of the annotated clips are rated by two or more users as'Fair' or better (i.e., >=3, To ensure data quality, we require workers to pass our qualification test before participating in our annotation task.
Neural Information Processing Systems
Aug-14-2025, 20:18:44 GMT