MA ViL: Masked Audio-Video Learners Po-Y ao Huang
–Neural Information Processing Systems
Empirically, MA ViL achieves state-of-the-art audio-video classification performance on AudioSet (53.3 mAP) and VGGSound (67.1% accuracy), surpassing recent self-supervised models and supervised models that utilize external labeled data.
Neural Information Processing Systems
Oct-8-2025, 13:18:18 GMT
- Country:
- Europe
- North America
- Canada > Quebec
- Montreal (0.04)
- United States > California
- Alameda County > Berkeley (0.04)
- San Diego County > San Diego (0.04)
- Canada > Quebec
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Leisure & Entertainment (0.46)
- Technology: