TokenVerse++: Towards Flexible Multitask Learning with Dynamic Task Activation

Kumar, Shashi, Madikeri, Srikanth, Villatoro-Tello, Esaú, Burdisso, Sergio, Rangappa, Pradeep, Carofilis, Andrés, Motlicek, Petr, Pandia, Karthik, Venkatesan, Shankar, Hacioğlu, Kadri, Stolcke, Andreas

Aug-28-2025–arXiv.org Artificial Intelligence

--T oken-based multitasking frameworks like T oken-V erse require all training utterances to have labels for all tasks, hindering their ability to leverage partially annotated datasets and scale effectively. We propose T okenV erse++, which introduces learnable vectors in the acoustic embedding space of the XLSR-Transducer ASR model for dynamic task activation. This core mechanism enables training with utterances labeled for only a subset of tasks, a key advantage over T okenV erse. We demonstrate this by successfully integrating a dataset with partial labels, specifically for ASR and an additional task, language identification, improving overall performance. T okenV erse++ achieves results on par with or exceeding T okenV erse across multiple tasks, establishing it as a more practical multitask alternative without sacrificing ASR performance. Index T erms --multitask training, speech recognition, speaker change detection, named entity recognition, language identification, XLSR-Transducer . Multitask learning enhances automatic speech recognition (ASR) by enabling multiple tasks in a single inference step, improving efficiency and functionality.

machine learning, natural language, tokenverse, (18 more...)

arXiv.org Artificial Intelligence

Aug-28-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland (0.28)
- North America > United States (0.28)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language (1.00)
  - Machine Learning (1.00)