AITopics | associating object

Associating Objects with Transformers for Video Object Segmentation

Neural Information Processing SystemsApr-24-2026, 19:36:00 GMT

In: ECCV (2020)[34] Yang, Z., Wei, Y., Yang, Y.: Collaborative video object segmentation by multi-scale foreground-background integration.

artificial intelligence, machine learning, segmentation, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Associating Objects with Transformers for Video Object Segmentation

Neural Information Processing SystemsApr-24-2026, 19:35:56 GMT

This paper investigates how to realize better and more efficient embedding learning to tackle the semi-supervised video object segmentation under challenging multi-object scenarios. The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources. To solve the problem, we propose an Associating Objects with Transformers (AOT) approach to match and decode multiple objects uniformly. In detail, AOT employs an identification mechanism to associate multiple targets into the same high-dimensional embedding space. Thus, we can simultaneously process multiple objects' matching and segmentation decoding as efficiently as processing a single object.

artificial intelligence, machine learning, segmentation, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Associating Objects with Transformers for Video Object Segmentation Zongxin Yang 1,2, Y unchao Wei 3,4, Yi Yang 1 1 CCAI, College of Computer Science and Technology, Zhejiang University 2

Neural Information Processing SystemsFeb-7-2026, 14:23:59 GMT

In short, the visual analysis further proves the necessity and effectiveness of our hierarchical LSTT. The hierarchical matching is not simply a combination of multiple matching processes.

artificial intelligence, machine learning, segmentation, (15 more...)

Neural Information Processing Systems

Country: Asia > China > Beijing > Beijing (0.05)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Associating Objects and Their Effects in Video through Coordination Games

Neural Information Processing SystemsDec-24-2025, 23:42:20 GMT

We explore a feed-forward approach for decomposing a video into layers, where each layer contains an object of interest along with its associated shadows, reflections, and other visual effects. This problem is challenging since associated effects vary widely with the 3D geometry and lighting conditions in the scene, and ground-truth labels for visual effects are difficult (and in some cases impractical) to collect. We take a self-supervised approach and train a neural network to produce a foreground image and alpha matte from a rough object segmentation mask under a reconstruction and sparsity loss. Under reconstruction loss, the layer decomposition problem is underdetermined: many combinations of layers may reconstruct the input video.Inspired by the game theory concept of focal points---or \emph{Schelling points}---we pose the problem as a coordination game, where each player (network) predicts the effects for a single object without knowledge of the other players' choices. The players learn to converge on the ``natural'' layer decomposition in order to maximize the likelihood of their choices aligning with the other players'. We train the network to play this game with itself, and show how to design the rules of this game so that the focal point lies at the correct layer decomposition. We demonstrate feed-forward results on a challenging synthetic dataset, then show that pretraining on this dataset significantly reduces optimization time for real videos.

associating object, coordination game, name change, (4 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

Associating Objects with Transformers for Video Object Segmentation

Neural Information Processing SystemsDec-23-2025, 19:22:38 GMT

This paper investigates how to realize better and more efficient embedding learning to tackle the semi-supervised video object segmentation under challenging multi-object scenarios. The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources. To solve the problem, we propose an Associating Objects with Transformers (AOT) approach to match and decode multiple objects uniformly. In detail, AOT employs an identification mechanism to associate multiple targets into the same high-dimensional embedding space. Thus, we can simultaneously process multiple objects' matching and segmentation decoding as efficiently as processing a single object.

associating object, transformer, video object segmentation, (5 more...)

Neural Information Processing Systems

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Vision (0.65)

Add feedback

Associating Objects and Their Effects in Video through Coordination Games

Neural Information Processing SystemsJan-18-2025, 13:07:54 GMT

We explore a feed-forward approach for decomposing a video into layers, where each layer contains an object of interest along with its associated shadows, reflections, and other visual effects. This problem is challenging since associated effects vary widely with the 3D geometry and lighting conditions in the scene, and ground-truth labels for visual effects are difficult (and in some cases impractical) to collect. We take a self-supervised approach and train a neural network to produce a foreground image and alpha matte from a rough object segmentation mask under a reconstruction and sparsity loss. Under reconstruction loss, the layer decomposition problem is underdetermined: many combinations of layers may reconstruct the input video.Inspired by the game theory concept of focal points---or \emph{Schelling points}---we pose the problem as a coordination game, where each player (network) predicts the effects for a single object without knowledge of the other players' choices. The players learn to converge on the natural'' layer decomposition in order to maximize the likelihood of their choices aligning with the other players'.

associating object, coordination game, layer decomposition, (2 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.41)

Add feedback

Associating Objects with Transformers for Video Object Segmentation

Neural Information Processing SystemsOct-9-2024, 14:25:38 GMT

This paper investigates how to realize better and more efficient embedding learning to tackle the semi-supervised video object segmentation under challenging multi-object scenarios. The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources. To solve the problem, we propose an Associating Objects with Transformers (AOT) approach to match and decode multiple objects uniformly. In detail, AOT employs an identification mechanism to associate multiple targets into the same high-dimensional embedding space. Thus, we can simultaneously process multiple objects' matching and segmentation decoding as efficiently as processing a single object.

associating object, transformer, video object segmentation, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.86)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback