Speech Translation with Large Language Models: An Industrial Practice
Huang, Zhichao, Ye, Rong, Ko, Tom, Dong, Qianqian, Cheng, Shanbo, Wang, Mingxuan, Li, Hang
–arXiv.org Artificial Intelligence
Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long audio inputs. Furthermore, our findings indicate that the implementation of Chain-of-Thought (CoT) prompting can yield advantages in the context of LLM-ST.
arXiv.org Artificial Intelligence
Dec-21-2023