From ViT Features to Training-free Video Object Segmentation via Streaming-data Mixture Models
–Neural Information Processing Systems
In the task of semi-supervised video object segmentation, the input is the binary mask of an object in the first frame, and the desired output consists of the corresponding masks of that object in the subsequent frames.
Neural Information Processing Systems
Feb-9-2026, 00:08:22 GMT