One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos Zechen Bai 1 Tong He2 Haiyang Mei 1 Pichao Wang 2

Oct-9-2025, 18:27:56 GMT–Neural Information Processing Systems

We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos. Leveraging the reasoning capabilities and world knowledge of large language models, and augmented by the Segment Anything Model, VideoLISA generates temporally consistent segmentation masks in videos based on language instructions.

benchmark, segmentation, video, (12 more...)

Neural Information Processing Systems

Oct-9-2025, 18:27:56 GMT

Conferences PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- Oceania > Australia
  - Western Australia > Perth (0.04)
- Europe > Netherlands
  - North Holland > Amsterdam (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.93)

Industry:
- Information Technology (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.67)

Duplicate Docs Excel Report

Title
OneTokentoSegThemAll: LanguageInstructed ReasoningSegmentationinVideos

Similar Docs Excel Report more

Title	Similarity	Source
None found