Egocentric Video-Language Pretraining Kevin Qinghong Lin

Neural Information Processing Systems 

Best performing works rely on large-scale, 3rd-person video-text datasets, such as HowTo100M.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found