SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation

Tan, Haotian, Ouchi, Hiroki, Sakti, Sakriani

Sep-29-2025–arXiv.org Artificial Intelligence

ABSTRACT How to make human-interpreter-like read/write decisions for simultaneous speech translation (SimulST) systems? Current state-of-the-art systems formulate SimulST as a multi-turn dialogue task, requiring specialized interleaved training data and relying on computationally expensive large language model (LLM) inference for decision-making. In this paper, we propose SimulSense, a novel framework for SimulST that mimics human interpreters by continuously reading input speech and triggering write decisions to produce translation when a new sense unit is perceived. Experiments against two state-of-the-art baseline systems demonstrate that our proposed method achieves a superior quality-latency tradeoff and substantially improved real-time efficiency, where its decision-making is up to 9.6 faster than the baselines. Index T erms-- simultaneous speech translation, LLM-based speech translation, decision policy, continuous integrate-and-fire 1. INTRODUCTION Simultaneous speech translation (SimulST) is a challenging task to perform translation in real-time with low latency while maintaining high translation quality.

artificial intelligence, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

Sep-29-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language > Machine Translation (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found