Associating Objects with Transformers for Video Object Segmentation

Dec-23-2025, 19:22:38 GMT–Neural Information Processing Systems

This paper investigates how to realize better and more efficient embedding learning to tackle the semi-supervised video object segmentation under challenging multi-object scenarios. The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources. To solve the problem, we propose an Associating Objects with Transformers (AOT) approach to match and decode multiple objects uniformly. In detail, AOT employs an identification mechanism to associate multiple targets into the same high-dimensional embedding space. Thus, we can simultaneously process multiple objects' matching and segmentation decoding as efficiently as processing a single object.

associating object, transformer, video object segmentation, (5 more...)

Neural Information Processing Systems

Dec-23-2025, 19:22:38 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Vision (0.65)