OmniVL: OneFoundationModelforImage-Language andVideo-Language Tasks
–Neural Information Processing Systems
This paper presents OmniVL, a new foundation model to support both imagelanguage and video-language tasks using one universal architecture.
Neural Information Processing Systems
Feb-7-2026, 23:04:40 GMT
- Technology: