One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos Zechen Bai 1 Tong He2 Haiyang Mei 1 Pichao Wang 2
–Neural Information Processing Systems
We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the reasoning capabilities and world knowledge of large language models, and augmented by the Segment Anything Model, VideoLISA generates temporally consistent segmentation masks in videos based on language instructions.
Neural Information Processing Systems
Oct-9-2025, 18:27:56 GMT
- Country:
- Asia > Singapore (0.04)
- Oceania > Australia
- Western Australia > Perth (0.04)
- Europe > Netherlands
- North Holland > Amsterdam (0.04)
- Genre:
- Research Report
- New Finding (1.00)
- Experimental Study (0.93)
- Research Report
- Industry:
- Information Technology (0.67)
- Technology: