AITopics | Machine Translation

Collaborating Authors

Machine Translation

"Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains."
– Definition from the European Association for Machine Translation (EAMT).

You can translate text of your choice by using free translators such as: CAPITA, Google Translate, SDL International, SYSTRAN.

News Overviews Instructional Materials AI-Alerts Classics

American Sign Language to Text Translation using Transformer and Seq2Seq with LSTM

Putra, Gregorius Guntur Sunardi, D'Layla, Adifa Widyadhani Chanda, Wahono, Dimas, Sarno, Riyanarto, Haryono, Agus Tri

arXiv.org Artificial IntelligenceSep-17-2024

Sign language translation is one of the important issues in communication between deaf and hearing people, as it expresses words through hand, body, and mouth movements. American Sign Language is one of the sign languages used, one of which is the alphabetic sign. The development of neural machine translation technology is moving towards sign language translation. Transformer became the state-of-the-art in natural language processing. This study compares the Transformer with the Sequence-to-Sequence (Seq2Seq) model in translating sign language to text. In addition, an experiment was conducted by adding Residual Long Short-Term Memory (ResidualLSTM) in the Transformer. The addition of ResidualLSTM to the Transformer reduces the performance of the Transformer model by 23.37% based on the BLEU Score value. In comparison, the Transformer itself increases the BLEU Score value by 28.14 compared to the Seq2Seq model.

sign language, transformer, translation, (13 more...)

arXiv.org Artificial Intelligence

2409.10874

Country:

North America > United States (0.15)
Asia > Indonesia > Java > East Java > Surabaya (0.05)
Europe > Switzerland (0.04)

Genre: Research Report > New Finding (0.69)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs

Mousi, Basel, Durrani, Nadir, Ahmad, Fatema, Hasan, Md. Arid, Hasanain, Maram, Kabbani, Tameem, Dalvi, Fahim, Chowdhury, Shammur Absar, Alam, Firoj

arXiv.org Artificial IntelligenceSep-17-2024

Arabic, with its rich diversity of dialects, remains significantly underrepresented in Large Language Models, particularly in dialectal variations. We address this gap by introducing seven synthetic datasets in dialects alongside Modern Standard Arabic (MSA), created using Machine Translation (MT) combined with human post-editing. We present AraDiCE, a benchmark for Arabic Dialect and Cultural Evaluation. We evaluate LLMs on dialect comprehension and generation, focusing specifically on low-resource Arabic dialects. Additionally, we introduce the first-ever fine-grained benchmark designed to evaluate cultural awareness across the Gulf, Egypt, and Levant regions, providing a novel dimension to LLM evaluation. Our findings demonstrate that while Arabic-specific models like Jais and AceGPT outperform multilingual models on dialectal tasks, significant challenges persist in dialect identification, generation, and translation. This work contributes ~45K post-edited samples, a cultural benchmark, and highlights the importance of tailored training to improve LLM performance in capturing the nuances of diverse Arabic dialects and cultural contexts. We will release the dialectal translation models and benchmarks curated in this study.

computational linguistic, dataset, dialect, (15 more...)

arXiv.org Artificial Intelligence

2409.11404

Country:

Africa > Middle East > Egypt (0.34)
Asia > Middle East > Yemen (0.14)
Asia > Middle East > Saudi Arabia (0.14)
(29 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GOSt-MT: A Knowledge Graph for Occupation-related Gender Biases in Machine Translation

Mastromichalakis, Orfeas Menis, Filandrianos, Giorgos, Tsouparopoulou, Eva, Parsanoglou, Dimitris, Symeonaki, Maria, Stamou, Giorgos

arXiv.org Artificial IntelligenceSep-17-2024

Gender bias in machine translation (MT) systems poses significant challenges that often result in the reinforcement of harmful stereotypes. Especially in the labour domain where frequently occupations are inaccurately associated with specific genders, such biases perpetuate traditional gender stereotypes with a significant impact on society. Addressing these issues is crucial for ensuring equitable and accurate MT systems. This paper introduces a novel approach to studying occupation-related gender bias through the creation of the GOSt-MT (Gender and Occupation Statistics for Machine Translation) Knowledge Graph. GOSt-MT integrates comprehensive gender statistics from real-world labour data and textual corpora used in MT training. This Knowledge Graph allows for a detailed analysis of gender bias across English, French, and Greek, facilitating the identification of persistent stereotypes and areas requiring intervention. By providing a structured framework for understanding how occupations are gendered in both labour markets and MT systems, GOSt-MT contributes to efforts aimed at making MT systems more equitable and reducing gender biases in automated translations.

knowledge graph, occupation, statistics, (16 more...)

arXiv.org Artificial Intelligence

2409.10989

Country:

Europe > United Kingdom (0.68)
Europe > France (0.05)
Europe > Greece > Attica > Athens (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Health Care Providers & Services (0.68)
Government > Regional Government > Europe Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Evaluation of Google Translate for Mandarin Chinese translation using sentiment and semantic analysis

Wang, Xuechun, Beard, Rodney, Chandra, Rohitash

arXiv.org Artificial IntelligenceSep-16-2024

Machine translation using large language models (LLMs) is having a significant global impact, making communication easier. Mandarin Chinese is the official language used for communication by the government and media in China. In this study, we provide an automated assessment of translation quality of Google Translate with human experts using sentiment and semantic analysis. In order to demonstrate our framework, we select the classic early twentieth-century novel 'The True Story of Ah Q' with selected Mandarin Chinese to English translations. We use Google Translate to translate the given text into English and then conduct a chapter-wise sentiment analysis and semantic analysis to compare the extracted sentiments across the different translations. Our results indicate that the precision of Google Translate differs both in terms of semantic and sentiment analysis when compared to human expert translations. We find that Google Translate is unable to translate some of the specific words or phrases in Chinese, such as Chinese traditional allusions. The mistranslations may be due to lack of contextual significance and historical knowledge of China.

google translate, mandarin chinese translation, sentiment and semantic analysis, (1 more...)

arXiv.org Artificial Intelligence

2409.04964

Country: Asia > China (0.44)

Genre: Research Report > New Finding (0.53)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

ELMI: Interactive and Intelligent Sign Language Translation of Lyrics for Song Signing

Yoo, Suhyeon, Truong, Khai N., Kim, Young-Ho

arXiv.org Artificial IntelligenceSep-15-2024

d/Deaf and hearing song-signers become prevalent on video-sharing platforms, but translating songs into sign language remains cumbersome and inaccessible. Our formative study revealed the challenges song-signers face, including semantic, syntactic, expressive, and rhythmic considerations in translations. We present ELMI, an accessible song-signing tool that assists in translating lyrics into sign language. ELMI enables users to edit glosses line-by-line, with real-time synced lyric highlighting and music video snippets. Users can also chat with a large language model-driven AI to discuss meaning, glossing, emoting, and timing. Through an exploratory study with 13 song-signers, we examined how ELMI facilitates their workflows and how song-signers leverage and receive an LLM-driven chat for translation. Participants successfully adopted ELMI to song-signing, with active discussions on the fly. They also reported improved confidence and independence in their translations, finding ELMI encouraging, constructive, and informative. We discuss design implications for leveraging LLMs in culturally sensitive song-signing translations.

elmi, participant, translation, (13 more...)

arXiv.org Artificial Intelligence

2409.0976

Country:

North America > Canada > Ontario > Toronto (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > District of Columbia > Washington (0.05)
(5 more...)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)
Personal > Interview (0.46)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Education > Curriculum > Subject-Specific Education (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Thesis proposal: Are We Losing Textual Diversity to Natural Language Processing?

Jon, Josef

arXiv.org Artificial IntelligenceSep-14-2024

This thesis argues that the currently widely used Natural Language Processing algorithms possibly have various limitations related to the properties of the texts they handle and produce. With the wide adoption of these tools in rapid progress, we must ask what these limitations are and what are the possible implications of integrating such tools even more deeply into our daily lives. As a testbed, we have chosen the task of Neural Machine Translation (NMT). Nevertheless, we aim for general insights and outcomes, applicable even to current Large Language Models (LLMs). We ask whether the algorithms used in NMT have inherent inductive biases that are beneficial for most types of inputs but might harm the processing of untypical texts. To explore this hypothesis, we define a set of measures to quantify text diversity based on its statistical properties, like uniformity or rhythmicity of word-level surprisal, on multiple scales (sentence, discourse, language). We then conduct a series of experiments to investigate whether NMT systems struggle with maintaining the diversity of such texts, potentially reducing the richness of the language generated by these systems, compared to human translators. We search for potential causes of these limitations rooted in training objectives and decoding algorithms. Our ultimate goal is to develop alternatives that do not enforce uniformity in the distribution of statistical properties in the output and that allow for better global planning of the translation, taking into account the intrinsic ambiguity of the translation task.

losing textual diversity, natural language processing, thesis proposal

arXiv.org Artificial Intelligence

2409.09568

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.87)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

What does it take to get state of the art in simultaneous speech-to-speech translation?

Wilmet, Vincent, Du, Johnson

arXiv.org Artificial IntelligenceSep-14-2024

This paper presents an in-depth analysis of the latency characteristics observed in simultaneous speech-to-speech model's performance, particularly focusing on hallucination-induced latency spikes. By systematically experimenting with various input parameters and conditions, we propose methods to minimize latency spikes and improve overall performance. The findings suggest that a combination of careful input management and strategic parameter adjustments can significantly enhance speech-to-speech model's latency behavior.

hallucination, latency, translation, (12 more...)

arXiv.org Artificial Intelligence

2409.00965

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy

Thompson, Brian, Mathur, Nitika, Deutsch, Daniel, Khayrallah, Huda

arXiv.org Artificial IntelligenceSep-14-2024

Selecting an automatic metric that best emulates human judgments is often non-trivial, because there is no clear definition of "best emulates." A meta-metric is required to compare the human judgments to the automatic metric judgments, and metric rankings depend on the choice of meta-metric. We propose Soft Pairwise Accuracy (SPA), a new meta-metric that builds on Pairwise Accuracy (PA) but incorporates the statistical significance of both the human judgments and the metric judgments. SPA allows for more fine-grained comparisons between systems than a simplistic binary win/loss, and addresses a number of shortcomings with PA: it is more stable with respect to both the number of systems and segments used for evaluation, it mitigates the issue of metric ties due to quantization, and it produces more statistically significant results. SPA was selected as the official system-level metric for the 2024 WMT metric shared task.

computational linguistic, machine translation, metric, (12 more...)

arXiv.org Artificial Intelligence

2409.09598

Country:

Asia > Singapore (0.05)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Bulgaria > Sofia City Province > Sofia (0.04)
(20 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.50)

Add feedback

Towards Precision Characterization of Communication Disorders using Models of Perceived Pragmatic Similarity

Ward, Nigel G., Segura, Andres, Bugarini, Georgina, Lehnert-LeHouillier, Heike, Liu, Dancheng, Xiong, Jinjun, Fuentes, Olac

arXiv.org Artificial IntelligenceSep-13-2024

The diagnosis and treatment of individuals with communication disorders offers many opportunities for the application of speech technology, but research so far has not adequately considered: the diversity of conditions, the role of pragmatic deficits, and the challenges of limited data. This paper explores how a general-purpose model of perceived pragmatic similarity may overcome these limitations. It explains how it might support several use cases for clinicians and clients, and presents evidence that a simple model can provide value, and in particular can capture utterance aspects that are relevant to diagnoses of autism and specific language impairment.

arXiv.org Artificial Intelligence

2409.0917

Country:

North America > United States > Texas (0.04)
North America > United States > Oregon (0.04)
North America > United States > New Mexico (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (0.90)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)

Add feedback

TeXBLEU: Automatic Metric for Evaluate LaTeX Format

Jung, Kyudan, Kim, Nam-Joon, Ryu, Hyongon, Hyeon, Sieun, Lee, Seung-jun, Lee, Hyeok-jae

arXiv.org Artificial IntelligenceSep-13-2024

LaTeX is suitable for creating specially formatted documents in science, technology, mathematics, and computer science. Although the use of mathematical expressions in LaTeX format along with language models is increasing, there are no proper evaluation matrices to evaluate them. In this study, we propose TeXBLEU, a metric for evaluating mathematical expressions in the LaTeX format built on the n-gram-based BLEU metric widely used in translation tasks. The proposed TeXBLEU consists of a predefined tokenizer trained on the arXiv paper dataset and a fine-tuned embedding model with positional encoding. The TeXBLEU score was calculated by replacing BLUE's modified precision score with the similarity of n-gram-based tokens. TeXBLEU showed improvements of 86\%, 121\%, and 610\% over traditional evaluation metrics, such as BLEU, sacreBLEU, and Rouge, respectively, on the MathBridge dataset with 1,000 data points. The code is available at https://github.com/KyuDan1/TeXBLEU.

evaluation, expression, texbleu, (13 more...)

arXiv.org Artificial Intelligence

2409.06639

Country:

Asia > South Korea > Seoul > Seoul (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
North America > United States > Michigan (0.04)
(2 more...)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback