NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts

Lin, Yen-Ting, Yang, Chao-Han Huck, Chen, Zhehuai, Zelasko, Piotr, Yang, Xuesong, Chen, Zih-Ching, Puvvada, Krishna C, Fu, Szu-Wei, Hu, Ke, Chiu, Jun Wei, Balam, Jagadeesh, Ginsburg, Boris, Wang, Yu-Chiang Frank

Nov-8-2024–arXiv.org Artificial Intelligence

Construction of a general-purpose post-recognition error corrector poses a crucial question: how can we most effectively train a model on a large mixture of domain datasets? The answer would lie in learning dataset-specific features and digesting their knowledge in a single model. Previous methods achieve this by having separate correction language models, resulting in a significant increase in parameters. In this work, we present Mixture-of-Experts as a solution, highlighting that MoEs are much more than a scalability tool. We propose a Multi-Task Correction MoE, where we train the experts to become an ``expert'' of speech-to-text, language-to-text and vision-to-text datasets by learning to route each dataset's tokens to its mapped expert. Experiments on the Open ASR Leaderboard show that we explore a new state-of-the-art performance by achieving an average relative $5.0$% WER reduction and substantial improvements in BLEU scores for speech and translation tasks. On zero-shot evaluation, NeKo outperforms GPT-3.5 and Claude-Opus with $15.5$% to $27.6$% relative WER reduction in the Hyporadise benchmark. NeKo performs competitively on grammar and post-OCR correction as a multi-task model.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Nov-8-2024

arXiv.org PDF

Add feedback

Country:
- Europe (1.00)
- North America > United States (0.93)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Speech > Speech Recognition (1.00)