Vript: A Video Is Worth Thousands of Words

Neural Information Processing Systems 

Compared to image-text pairs [11, 12], video-text pairs are harder to obtain and annotate.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found