AITopics | weakly-supervised audio-visual video parsing

Collaborating Authors

weakly-supervised audio-visual video parsing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing

Neural Information Processing SystemsDec-25-2025, 12:46:22 GMT

The audio-visual video parsing task aims to parse a video into modality-and category-aware temporal segments. Previous work mainly focuses on weakly-supervised approaches, which learn from video-level event labels. During training, they do not know which modality perceives and meanwhile which temporal segment contains the video event. Since there is no explicit grouping in the existing frameworks, the modality and temporal uncertainties make these methods suffer from false predictions. For instance, segments in the same category could be predicted in different event classes. Learning compact and discriminative multi-modal subspaces is essential for mitigating the issue. To this end, in this paper, we propose a novel Multi-modal Grouping Network, namely MGN, for explicitly semantic-aware grouping.

multi-modal grouping network, name change, weakly-supervised audio-visual video parsing, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.46)

Add feedback

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing

Neural Information Processing SystemsDec-24-2025, 04:52:02 GMT

The audio-visual video parsing task aims to temporally parse a video into audio or visual event categories. However, it is labor intensive to temporally annotate audio and visual events and thus hampers the learning of a parsing model. To this end, we propose to explore additional cross-video and cross-modality supervisory signals to facilitate weakly-supervised audio-visual video parsing. The proposed method exploits both the common and diverse event semantics across videos to identify audio or visual events. In addition, our method explores event co-occurrence across audio, visual, and audio-visual streams.

cross-modality signal, name change, weakly-supervised audio-visual video parsing, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.95)

Add feedback

Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing

Neural Information Processing SystemsJan-19-2025, 02:51:12 GMT

The audio-visual video parsing task aims to parse a video into modality- and category-aware temporal segments. Previous work mainly focuses on weakly-supervised approaches, which learn from video-level event labels. During training, they do not know which modality perceives and meanwhile which temporal segment contains the video event. Since there is no explicit grouping in the existing frameworks, the modality and temporal uncertainties make these methods suffer from false predictions. For instance, segments in the same category could be predicted in different event classes.

multi-modal grouping network, temporal segment, weakly-supervised audio-visual video parsing, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)

Add feedback

Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing

Neural Information Processing SystemsOct-10-2024, 16:40:54 GMT

cross-modality signal, supervisory signal, weakly-supervised audio-visual video parsing

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback