AITopics | spoken language translation

Collaborating Authors

spoken language translation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SimulSense: Sense-Driven Interpreting for Efficient Simultaneous Speech Translation

Tan, Haotian, Ouchi, Hiroki, Sakti, Sakriani

arXiv.org Artificial IntelligenceSep-29-2025

ABSTRACT How to make human-interpreter-like read/write decisions for simultaneous speech translation (SimulST) systems? Current state-of-the-art systems formulate SimulST as a multi-turn dialogue task, requiring specialized interleaved training data and relying on computationally expensive large language model (LLM) inference for decision-making. In this paper, we propose SimulSense, a novel framework for SimulST that mimics human interpreters by continuously reading input speech and triggering write decisions to produce translation when a new sense unit is perceived. Experiments against two state-of-the-art baseline systems demonstrate that our proposed method achieves a superior quality-latency tradeoff and substantially improved real-time efficiency, where its decision-making is up to 9.6 faster than the baselines. Index T erms-- simultaneous speech translation, LLM-based speech translation, decision policy, continuous integrate-and-fire 1. INTRODUCTION Simultaneous speech translation (SimulST) is a challenging task to perform translation in real-time with low latency while maintaining high translation quality.

artificial intelligence, natural language, translation, (16 more...)

arXiv.org Artificial Intelligence

2509.21932

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Bemba Speech Translation: Exploring a Low-Resource African Language

Farouq, Muhammad Hazim Al, Wassie, Aman Kassahun, Moslem, Yasmin

arXiv.org Artificial IntelligenceAug-14-2025

This paper describes our system submission to the International Conference on Spoken Language Translation (IWSLT 2025), low-resource languages track, namely for Bemba-to-English speech translation. We built cascaded speech translation systems based on Whisper and NLLB-200, and employed data augmentation techniques, such as back-translation. We investigate the effect of using synthetic data and discuss our experimental setup.

artificial intelligence, computational linguistic, natural language, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.iwslt-1.37

2505.02518

Country:

Europe (1.00)
North America > United States (0.29)
Asia > Indonesia (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

MLLP-VRAIN UPV system for the IWSLT 2025 Simultaneous Speech Translation Translation task

Iranzo-Sánchez, Jorge, Iranzo-Sánchez, Javier, Giménez, Adrià, Civera, Jorge, Juan, Alfons

arXiv.org Artificial IntelligenceJun-24-2025

This work describes the participation of the MLLP-VRAIN research group in the shared task of the IWSLT 2025 Simultaneous Speech Translation track. Our submission addresses the unique challenges of real-time translation of long-form speech by developing a modular cascade system that adapts strong pre-trained models to streaming scenarios. We combine Whisper Large-V3-Turbo for ASR with the multilingual NLLB-3.3B model for MT, implementing lightweight adaptation techniques rather than training new end-to-end models from scratch. Our approach employs document-level adaptation with prefix training to enhance the MT model's ability to handle incomplete inputs, while incorporating adaptive emission policies including a wait-$k$ strategy and RALCP for managing the translation stream. Specialized buffer management techniques and segmentation strategies ensure coherent translations across long audio sequences. Experimental results on the ACL60/60 dataset demonstrate that our system achieves a favorable balance between translation quality and latency, with a BLEU score of 31.96 and non-computational-aware StreamLAAL latency of 2.94 seconds. Our final model achieves a preliminary score on the official test set (IWSLT25Instruct) of 29.8 BLEU. Our work demonstrates that carefully adapted pre-trained components can create effective simultaneous translation systems for long-form content without requiring extensive in-domain parallel data or specialized end-to-end training.

artificial intelligence, computational linguistic, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.18828

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Pennsylvania (0.28)

Genre: Research Report (0.50)

Industry: Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

End-to-End Speech Translation for Low-Resource Languages Using Weakly Labeled Data

Pothula, Aishwarya, Akkiraju, Bhavana, Bandarupalli, Srihari, D, Charan, Kesiraju, Santosh, Vuppala, Anil Kumar

arXiv.org Artificial IntelligenceJun-23-2025

The scarcity of high-quality annotated data presents a significant challenge in developing effective end-to-end speech-to-text translation (ST) systems, particularly for low-resource languages. This paper explores the hypothesis that weakly labeled data can be used to build ST models for low-resource language pairs. We constructed speech-to-text translation datasets with the help of bitext mining using state-of-the-art sentence encoders. We mined the multilingual Shrutilipi corpus to build Shrutilipi-anuvaad, a dataset comprising ST data for language pairs Bengali-Hindi, Malayalam-Hindi, Odia-Hindi, and Telugu-Hindi. We created multiple versions of training data with varying degrees of quality and quantity to investigate the effect of quality versus quantity of weakly labeled data on ST model performance. Results demonstrate that ST systems can be built using weakly labeled data, with performance comparable to massive multi-modal multilingual baselines such as SONAR and SeamlessM4T.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.16251

Country:

Asia > India (0.28)
North America > United States > Minnesota (0.28)
Europe > Czechia (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

CMU's IWSLT 2025 Simultaneous Speech Translation System

Ouyang, Siqi, Xu, Xi, Li, Lei

arXiv.org Artificial IntelligenceJun-17-2025

This paper presents CMU's submission to the IWSLT 2025 Simultaneous Speech Translation (SST) task for translating unsegmented English speech into Chinese and German text in a streaming manner. Our end-to-end speech-to-text system integrates a chunkwise causal Wav2Vec 2.0 speech encoder, an adapter, and the Qwen2.5-7B-Instruct as the decoder. We use a two-stage simultaneous training procedure on robust speech segments curated from LibriSpeech, CommonVoice, and VoxPopuli datasets, utilizing standard cross-entropy loss. Our model supports adjustable latency through a configurable latency multiplier. Experimental results demonstrate that our system achieves 44.3 BLEU for English-to-Chinese and 25.1 BLEU for English-to-German translations on the ACL60/60 development set, with computation-aware latencies of 2.7 seconds and 2.3 seconds, and theoretical latencies of 2.2 and 1.7 seconds, respectively.

computational linguistic, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.13143

Country: North America > United States > Pennsylvania (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Prompting LLMs: Length Control for Isometric Machine Translation

Javorský, Dávid, Bojar, Ondřej, Yvon, François

arXiv.org Artificial IntelligenceJun-6-2025

In this study, we explore the effectiveness of isometric machine translation across multiple language pairs (En$\to$De, En$\to$Fr, and En$\to$Es) under the conditions of the IWSLT Isometric Shared Task 2022. Using eight open-source large language models (LLMs) of varying sizes, we investigate how different prompting strategies, varying numbers of few-shot examples, and demonstration selection influence translation quality and length control. We discover that the phrasing of instructions, when aligned with the properties of the provided demonstrations, plays a crucial role in controlling the output length. Our experiments show that LLMs tend to produce shorter translations only when presented with extreme examples, while isometric demonstrations often lead to the models disregarding length constraints. While few-shot prompting generally enhances translation quality, further improvements are marginal across 5, 10, and 20-shot settings. Finally, considering multiple outputs allows to notably improve overall tradeoff between the length and quality, yielding state-of-the-art performance for some language pairs.

large language model, machine learning, translation, (19 more...)

arXiv.org Artificial Intelligence

2506.04855

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

IIITH-BUT system for IWSLT 2025 low-resource Bhojpuri to Hindi speech translation

Akkiraju, Bhavana, Pothula, Aishwarya, Kesiraju, Santosh, Vuppala, Anil Kumar

arXiv.org Artificial IntelligenceJun-6-2025

This paper presents the submission of IIITH-BUT to the IWSLT 2025 shared task on speech translation for the low-resource Bhojpuri-Hindi language pair. We explored the impact of hyperparameter optimisation and data augmentation techniques on the performance of the SeamlessM4T model fine-tuned for this specific task. We systematically investigated a range of hyperparameters including learning rate schedules, number of update steps, warm-up steps, label smoothing, and batch sizes; and report their effect on translation quality. To address data scarcity, we applied speed perturbation and SpecAugment and studied their effect on translation quality. We also examined the use of cross-lingual signal through joint training with Marathi and Bhojpuri speech data. Our experiments reveal that careful selection of hyperparameters and the application of simple yet effective augmentation techniques significantly improve performance in low-resource settings. We also analysed the translation hypotheses to understand various kinds of errors that impacted the translation quality in terms of BLEU.

artificial intelligence, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2506.04714

Country:

Asia (0.94)
North America > United States (0.28)
Europe > Czechia (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

HENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and Translation

Hussein, Amir, Xiao, Cihan, Wiesner, Matthew, Povey, Dan, Garcia, Leibny Paola, Khudanpur, Sanjeev

arXiv.org Artificial IntelligenceJun-4-2025

Neural transducers (NT) provide an effective framework for speech streaming, demonstrating strong performance in automatic speech recognition (ASR). However, the application of NT to speech translation (ST) remains challenging, as existing approaches struggle with word reordering and performance degradation when jointly modeling ASR and ST, resulting in a gap with attention-based encoder-decoder (AED) models. Existing NT-based ST approaches also suffer from high computational training costs. To address these issues, we propose HENT-SRT (Hierarchical Efficient Neural Transducer for Speech Recognition and Translation), a novel framework that factorizes ASR and translation tasks to better handle reordering. To ensure robust ST while preserving ASR performance, we use self-distillation with CTC consistency regularization. Moreover, we improve computational efficiency by incorporating best practices from ASR transducers, including a down-sampled hierarchical encoder, a stateless predictor, and a pruned transducer loss to reduce training complexity. Finally, we introduce a blank penalty during decoding, reducing deletions and improving translation quality. Our approach is evaluated on three conversational datasets Arabic, Spanish, and Mandarin achieving new state-of-the-art performance among NT models and substantially narrowing the gap with AED-based systems.

artificial intelligence, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2506.02157

Country:

Europe (0.68)
North America > Canada (0.28)
Asia > Japan (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Papi, Sara, Polak, Peter, Bojar, Ondřej, Macháček, Dominik

arXiv.org Artificial IntelligenceDec-24-2024

Simultaneous speech-to-text translation (SimulST) translates source-language speech into target-language text concurrently with the speaker's speech, ensuring low latency for better user comprehension. Despite its intended application to unbounded speech, most research has focused on human pre-segmented speech, simplifying the task and overlooking significant challenges. This narrow focus, coupled with widespread terminological inconsistencies, is limiting the applicability of research outcomes to real-world applications, ultimately hindering progress in the field. Our extensive literature review of 110 papers not only reveals these critical issues in current research but also serves as the foundation for our key contributions. We 1) define the steps and core components of a SimulST system, proposing a standardized terminology and taxonomy; 2) conduct a thorough analysis of community trends, and 3) offer concrete recommendations and future directions to bridge the gaps in existing literature, from evaluation frameworks to system architectures, for advancing the field towards more realistic and effective SimulST solutions.

machine learning, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2412.18495

Country:

Asia > Thailand > Bangkok > Bangkok (0.05)
North America > Canada > Ontario > Toronto (0.05)
Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
(36 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Findings of the IWSLT 2024 Evaluation Campaign

Ahmad, Ibrahim Said, Anastasopoulos, Antonios, Bojar, Ondřej, Borg, Claudia, Carpuat, Marine, Cattoni, Roldano, Cettolo, Mauro, Chen, William, Dong, Qianqian, Federico, Marcello, Haddow, Barry, Javorský, Dávid, Krubiński, Mateusz, Lam, Tsz Kin, Ma, Xutai, Mathur, Prashant, Matusov, Evgeny, Maurya, Chandresh, McCrae, John, Murray, Kenton, Nakamura, Satoshi, Negri, Matteo, Niehues, Jan, Niu, Xing, Ojha, Atul Kr., Ortega, John, Papi, Sara, Polák, Peter, Pospíšil, Adam, Pecina, Pavel, Salesky, Elizabeth, Sethiya, Nivedita, Sarkar, Balaram, Shi, Jiatong, Sikasote, Claytone, Sperber, Matthias, Stüker, Sebastian, Sudoh, Katsuhito, Thompson, Brian, Turchi, Marco, Waibel, Alex, Watanabe, Shinji, Wilken, Patrick, Zemánek, Petr, Zevallos, Rodolfo

arXiv.org Artificial IntelligenceNov-7-2024

This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks attracted 18 teams whose submissions are documented in 26 system papers. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper, almost evenly distributed across industry and academia.

machine learning, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2411.05088

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Czechia (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
(50 more...)

Genre: Research Report > Experimental Study (0.92)

Industry:

Leisure & Entertainment (0.94)
Education (0.68)
Media > Television (0.47)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback