CoLLA T: On Adding Fine-grained Audio Understanding to Language Models using Token-Level Locked-Language Tuning
–Neural Information Processing Systems
Humans can easily understand various audio concepts, but conventional audio classification models fail due to their inability to predict unseen classes during training.
Neural Information Processing Systems
Oct-9-2025, 07:05:15 GMT