SALM: Speech-augmented Language Model with In-context Learning for Speech Recognition and Translation

Chen, Zhehuai, Huang, He, Andrusenko, Andrei, Hrinchuk, Oleksii, Puvvada, Krishna C., Li, Jason, Ghosh, Subhankar, Balam, Jagadeesh, Ginsburg, Boris

Oct-13-2023–arXiv.org Artificial Intelligence

We present a novel Speech Augmented Language Model (SALM) with {\em multitask} and {\em in-context} learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a modality adapter module, and LoRA layers to accommodate speech input and associated task instructions. The unified SALM not only achieves performance on par with task-specific Conformer baselines for Automatic Speech Recognition (ASR) and Speech Translation (AST), but also exhibits zero-shot in-context learning capabilities, demonstrated through keyword-boosting task for ASR and AST. Moreover, {\em speech supervised in-context training} is proposed to bridge the gap between LLM training and downstream speech tasks, which further boosts the in-context learning ability of speech-to-text models. Proposed model is open-sourced via NeMo toolkit.

arxiv, instruction, salm, (11 more...)

arXiv.org Artificial Intelligence

Oct-13-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)

Genre:
- Research Report (0.50)

Industry:
- Information Technology (0.30)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found