Supplementary Material for " Embracing Consistency: A One-Stage Approach for Spatio-Temporal Video Grounding " Y ang Jin

Aug-18-2025, 08:07:02 GMT–Neural Information Processing Systems

Then, the additional implementation details are provided in Section 2. Next, Section 3 presents more ablation study results with respect to model designs and hyper-parameter settings. The detailed computation pipeline of the proposed query-guided decoding is shown in Figure 1. The Architecture of the proposed query-guided decoder and prediction head. The proposed model is trained on 32 Nvidia A100 GPUs with 1 video per GPU. The detailed results are shown in Table 1 and Table 2. Finally, we provide the detailed ablation results of the temporal interaction layer for HC-STVG benchmark in Table 3b.

artificial intelligence, machine learning, resolution, (15 more...)

Neural Information Processing Systems

Aug-18-2025, 08:07:02 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.90)

Duplicate Docs Excel Report

Title
bc18c538d983cea434f9281148d43e1e-Supplemental-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found