LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models

Chen, Xi, Zhang, Songyang, Bai, Qibing, Chen, Kai, Nakamura, Satoshi

Jul-22-2024–arXiv.org Artificial Intelligence

We introduces LLaST, a framework for building high-performance Large Language model based Speech-to-text Translation systems. We address the limitations of end-to-end speech translation(E2E ST) models by exploring model architecture design and optimization techniques tailored for LLMs. Our approach includes LLM-based speech translation architecture design, ASR-augmented training, multilingual data augmentation, and dual-LoRA optimization. Our approach demonstrates superior performance on the CoVoST-2 benchmark and showcases exceptional scaling capabilities powered by LLMs. We believe this effective method will serve as a strong baseline for speech translation and provide insights for future improvements of the LLM-based speech translation framework. We release the data, code and models in https://github.com/openaudiolab/LLaST.

arxiv preprint arxiv, speech translation, translation, (12 more...)

arXiv.org Artificial Intelligence

Jul-22-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Washington > King County
    - Seattle (0.04)
  - Louisiana > Orleans Parish
    - New Orleans (0.04)
- Europe
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
- Asia
  - Singapore (0.04)
  - Japan (0.04)
  - China
    - Shanghai > Shanghai (0.04)
    - Hong Kong (0.04)
    - Guangdong Province > Shenzhen (0.04)

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language
    - Machine Translation (1.00)
    - Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.30)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found