OneTokentoSegThemAll: LanguageInstructed ReasoningSegmentationinVideos

Neural Information Processing Systems 

We introduce VideoLISA, a video-based multimodal large language model designed to tackle the problem of language-instructed reasoning segmentation in videos.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found