TransLLaMa: LLM-based Simultaneous Translation System

Koshkin, Roman, Sudoh, Katsuhito, Nakamura, Satoshi

Feb-7-2024–arXiv.org Artificial Intelligence

Decoder-only large language models (LLMs) have recently demonstrated impressive capabilities in text generation and reasoning. Nonetheless, they have limited applications in simultaneous machine translation (SiMT), currently dominated by encoder-decoder transformers. This study demonstrates that, after fine-tuning on a small dataset comprising causally aligned source and target sentence pairs, a pre-trained open-source LLM can control input segmentation directly by generating a special "wait" token. This obviates the need for a separate policy and enables the LLM to perform English-German and English-Russian SiMT tasks with BLEU scores that are comparable to those of specific state-of-the-art baselines. We also evaluated closed-source models such as GPT-4, which displayed encouraging results in performing the SiMT task without prior training (zero-shot), indicating a promising avenue for enhancing future SiMT systems.

computational linguistic, linguistic, translation, (17 more...)

arXiv.org Artificial Intelligence

Feb-7-2024

arXiv.org PDF

Add feedback

Country:
- Antarctica (0.04)
- North America
  - Dominican Republic (0.04)
  - United States
    - Pennsylvania (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - France (0.04)
  - Spain
    - Galicia > Madrid (0.04)
    - Valencian Community > Valencia Province
      - Valencia (0.04)
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - South Korea (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
    - Qatar > Ad-Dawhah
      - Doha (0.04)
  - Japan > Kyūshū & Okinawa
    - Okinawa (0.04)
  - India > Karnataka
    - Bengaluru (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)