Speech Translation with Large Language Models: An Industrial Practice

Huang, Zhichao, Ye, Rong, Ko, Tom, Dong, Qianqian, Cheng, Shanbo, Wang, Mingxuan, Li, Hang

Dec-21-2023–arXiv.org Artificial Intelligence

Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM. By integrating the large language model (LLM) with a speech encoder and employing multi-task instruction tuning, LLM-ST can produce accurate timestamped transcriptions and translations, even from long audio inputs. Furthermore, our findings indicate that the implementation of Chain-of-Thought (CoT) prompting can yield advantages in the context of LLM-ST.

arxiv preprint arxiv, speech translation, translation, (11 more...)

arXiv.org Artificial Intelligence

Dec-21-2023

arXiv.org PDF

Add feedback

Country:
- Asia > China > Beijing > Beijing (0.04)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)