Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing Shentong Mo Carnegie Mellon University Y apeng Tian University of Texas at Dallas
–Neural Information Processing Systems
The audio-visual video parsing task aims to parse a video into modality-and category-aware temporal segments. Previous work mainly focuses on weakly-supervised approaches, which learn from video-level event labels.
Neural Information Processing Systems
Aug-19-2025, 12:49:07 GMT
- Country:
- North America > United States > Texas (0.40)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Grammars & Parsing (0.64)
- Vision (1.00)
- Information Technology > Artificial Intelligence