Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models
–Neural Information Processing Systems
AF3 introduces: CMM (i) AF-Whisper, a unified audio encoder trainedPrevious SOTA (Closed Source) using a novel strategy for joint representation learning across all 3 modalities of speech, sound, and music; (ii) flexible, on-demand thinking, allowing the model to do chain-of-thought-type reasoning before answering; (iii) multi-turn, multiaudio chat; (iv) long audio understanding and reasoning (including speech) up MMSU to 10 minutes; and (v) voice-to-voice interaction. To enable these capabilities, (avg.)
Neural Information Processing Systems
Jun-16-2026, 13:52:43 GMT
- Country:
- North America > United States (1.00)
- Europe (1.00)
- Genre:
- Research Report > Experimental Study (1.00)
- Technology:
- Information Technology
- Data Science (0.92)
- Artificial Intelligence
- Vision (1.00)
- Speech > Speech Recognition (1.00)
- Representation & Reasoning (1.00)
- Cognitive Science (0.92)
- Natural Language
- Large Language Model (1.00)
- Chatbot (1.00)
- Machine Learning > Neural Networks
- Deep Learning (0.93)
- Information Technology