Redefining Machine Simultaneous Interpretation: From Incremental Translation to Human-Like Strategies
Zhang, Qianen, Nakamura, Satoshi
–arXiv.org Artificial Intelligence
Simultaneous Machine Translation (SiMT) requires high-quality translations under strict real-time constraints, which traditional encoder-decoder policies with only READ/WRITE actions cannot fully address. We extend the action space of SiMT with four adaptive actions: SENTENCE_CUT, DROP, PAR-TIAL_SUMMARIZATION and PRONOMINALIZATION, which enable real-time restructuring, omission, and simplification while preserving semantic fidelity. We implement these actions in a decoder-only large language model (LLM) framework and construct training references through action-aware prompting. To evaluate both quality and latency, we further develop a latency-aware TTS pipeline that maps textual outputs to speech with realistic timing. Experiments on the ACL60/60 English-Chinese and English-German benchmarks show that our framework consistently improves semantic metrics (e.g., COMET-KIWI) and achieves lower delay (measured by Average Lagging) compared to reference translations and salami-based baselines. Notably, combining DROP and SEN-TENCE_CUT yields the best overall balance between fluency and latency. These results demonstrate that enriching the action space of LLM-based SiMT provides a promising direction for bridging the gap between human and machine interpretation.
arXiv.org Artificial Intelligence
Sep-29-2025
- Country:
- Europe (1.00)
- Asia (1.00)
- North America > United States
- Minnesota (0.28)
- Genre:
- Research Report > New Finding (0.66)
- Technology: