Speech Translation with Large Language Models: An Industrial Practice

Huang, Zhichao, Ye, Rong, Ko, Tom, Dong, Qianqian, Cheng, Shanbo, Wang, Mingxuan, Li, Hang

arXiv.org Artificial Intelligence 

Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long audio inputs. Furthermore, our findings indicate that the implementation of Chain-of-Thought (CoT) prompting can yield advantages in the context of LLM-ST.