TokenVerse++: Towards Flexible Multitask Learning with Dynamic Task Activation
Kumar, Shashi, Madikeri, Srikanth, Villatoro-Tello, Esaú, Burdisso, Sergio, Rangappa, Pradeep, Carofilis, Andrés, Motlicek, Petr, Pandia, Karthik, Venkatesan, Shankar, Hacioğlu, Kadri, Stolcke, Andreas
–arXiv.org Artificial Intelligence
--T oken-based multitasking frameworks like T oken-V erse require all training utterances to have labels for all tasks, hindering their ability to leverage partially annotated datasets and scale effectively. We propose T okenV erse++, which introduces learnable vectors in the acoustic embedding space of the XLSR-Transducer ASR model for dynamic task activation. This core mechanism enables training with utterances labeled for only a subset of tasks, a key advantage over T okenV erse. We demonstrate this by successfully integrating a dataset with partial labels, specifically for ASR and an additional task, language identification, improving overall performance. T okenV erse++ achieves results on par with or exceeding T okenV erse across multiple tasks, establishing it as a more practical multitask alternative without sacrificing ASR performance. Index T erms --multitask training, speech recognition, speaker change detection, named entity recognition, language identification, XLSR-Transducer . Multitask learning enhances automatic speech recognition (ASR) by enabling multiple tasks in a single inference step, improving efficiency and functionality.
arXiv.org Artificial Intelligence
Aug-28-2025
- Country:
- Asia > India (0.04)
- Europe
- Czechia > South Moravian Region
- Brno (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- Czechia > South Moravian Region
- North America > United States
- Florida > Miami-Dade County > Miami (0.04)
- Genre:
- Research Report (0.82)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence