Appendix: Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners Zhenhailong Wang
–Neural Information Processing Systems
For V aTeX captioning and retrieval, we use the latest v1.1 version The statistics can be found in Table 1. We show the impact of temporal-aware prompt on capturing temporal dynamics in videos.
Neural Information Processing Systems
Sep-24-2025, 21:53:06 GMT