Appendix: Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners Zhenhailong Wang

Neural Information Processing Systems 

For V aTeX captioning and retrieval, we use the latest v1.1 version The statistics can be found in Table 1. We show the impact of temporal-aware prompt on capturing temporal dynamics in videos.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found