OmniVL: OneFoundationModelforImage-Language andVideo-Language Tasks

Neural Information Processing Systems 

This paper presents OmniVL, a new foundation model to support both imagelanguage and video-language tasks using one universal architecture.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found