AITopics | hierarchical propagation

eb890c36af87e4ca82e8ef7bcba6a284-Paper-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 16:11:37 GMT

hierarchical propagation, propagation, segmentation, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

Neural Information Processing SystemsDec-25-2025, 15:36:07 GMT

This paper focuses on developing a more effective method of hierarchical propagation for semi-supervised Video Object Segmentation (VOS). Based on vision transformers, the recently-developed Associating Objects with Transformers (AOT) approach introduces hierarchical propagation into VOS and has shown promising results. The hierarchical propagation can gradually propagate information from past frames to the current frame and transfer the current frame feature from object-agnostic to object-specific. However, the increase of object-specific information will inevitably lead to the loss of object-agnostic visual information in deep propagation layers. To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach.

decoupling feature, hierarchical propagation, video object segmentation, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Supplementary Materials of Decoupling Features in Hierarchical Propagation for Video Object Segmentation

Neural Information Processing SystemsAug-19-2025, 16:41:53 GMT

The optimization strategies and related hyper-parameters are also the same as AOT. The loss function is a 0.5:0.5 combination of BCE loss [ Such a process is necessary to keep enough long-term information and avoid facing out of memory when inferring long videos. The longest video in VOT 2020 contains 1,500 frames. We compare our DeAOT with more VOS methods in Table 2 and 1. VOS cases, including similar objects, occlusion, fast motion, motion blur, etc. A.4 Border Impact and Limitations The proposed DeAOT framework significantly improves VOS's performance, robustness, and robustness. As to limitations, the scenarios with multiple similar objects and severe occlusions are still very challenging for DeAOT and other VOS solutions.

artificial intelligence, object-oriented architecture, segmentation, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.55)

Add feedback

eb890c36af87e4ca82e8ef7bcba6a284-Paper-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 16:41:50 GMT

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Decoupling Features in Hierarchical Propagation for Video Object Segmentation

Neural Information Processing SystemsJan-19-2025, 05:31:32 GMT

This paper focuses on developing a more effective method of hierarchical propagation for semi-supervised Video Object Segmentation (VOS). Based on vision transformers, the recently-developed Associating Objects with Transformers (AOT) approach introduces hierarchical propagation into VOS and has shown promising results. The hierarchical propagation can gradually propagate information from past frames to the current frame and transfer the current frame feature from object-agnostic to object-specific. However, the increase of object-specific information will inevitably lead to the loss of object-agnostic visual information in deep propagation layers. To solve such a problem and further facilitate the learning of visual embeddings, this paper proposes a Decoupling Features in Hierarchical Propagation (DeAOT) approach. Secondly, to compensate for the additional computation from dual-branch propagation, we propose an efficient module for constructing hierarchical propagation, i.e., Gated Propagation Module, which is carefully designed with single-head attention.

decoupling feature, hierarchical propagation, video object segmentation, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Lester: rotoscope animation through video object segmentation and tracking

Tous, Ruben

arXiv.org Artificial IntelligenceFeb-15-2024

This article introduces Lester, a novel method to automatically synthetise retro-style 2D animations from videos. The method approaches the challenge mainly as an object segmentation and tracking problem. Video frames are processed with the Segment Anything Model (SAM) and the resulting masks are tracked through subsequent frames with DeAOT, a method of hierarchical propagation for semi-supervised video object segmentation. The geometry of the masks' contours is simplified with the Douglas-Peucker algorithm. Finally, facial traits, pixelation and a basic shadow effect can be optionally added. The results show that the method exhibits an excellent temporal consistency and can correctly process videos with different poses and appearances, dynamic shots, partial shots and diverse backgrounds. The proposed method provides a more simple and deterministic approach than diffusion models based video-to-video translation pipelines, which suffer from temporal consistency problems and do not cope well with pixelated and schematic outputs. The method is also much most practical than techniques based on 3D human pose estimation, which require custom handcrafted 3D models and are very limited with respect to the type of scenes they can process.

animation, contour, video, (12 more...)

arXiv.org Artificial Intelligence

2402.09883

Country: