AITopics | google translate

Collaborating Authors

google translate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Google Translate is now better at translating slang terms and idioms using AI

EngadgetDec-12-2025, 17:34:28 GMT

Google has also introduced a new speech-to-speech translation feature for headphones. Google is rolling out new Gemini-assisted functionality to Search and its Translate app. It says its AI can now provide more natural and accurate text translations for phrases that have more nuanced meanings. Translate will now take slang terms and colloquial expressions into consideration rather than provide sometimes unhelpful direct translations. The latest update to its text translation feature is rolling out first in the US and India, translating between English and just under 20 other languages, including German, Spanish, Chinese and Arabic.

artificial intelligence, natural language, translation, (12 more...)

Engadget

Country:

North America > United States (0.27)
Asia > India (0.27)
Europe > Sweden (0.06)
Europe > Germany (0.06)

Industry: Education (0.33)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.90)
Information Technology > Communications > Mobile (0.81)

Add feedback

Measuring the Effect of Disfluency in Multilingual Knowledge Probing Benchmarks

Semenov, Kirill, Sennrich, Rico

arXiv.org Artificial IntelligenceOct-20-2025

For multilingual factual knowledge assessment of LLMs, benchmarks such as MLAMA use template translations that do not take into account the grammatical and semantic information of the named entities inserted in the sentence. This leads to numerous instances of ungrammaticality or wrong wording of the final prompts, which complicates the interpretation of scores, especially for languages that have a rich morphological inventory. In this work, we sample 4 Slavic languages from the MLAMA dataset and compare the knowledge retrieval scores between the initial (templated) MLAMA dataset and its sentence-level translations made by Google Translate and ChatGPT. We observe a significant increase in knowledge retrieval scores, and provide a qualitative analysis for possible reasons behind it. We also make an additional analysis of 5 more languages from different families and see similar patterns. Therefore, we encourage the community to control the grammaticality of highly multilingual datasets for higher and more interpretable results, which is well approximated by whole sentence translation with neural MT or LLM systems. The dataset and all related code is published at the Github repository: https://github.com/ZurichNLP/Fluent-mLAMA.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.15115

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.67)

Genre: Research Report > New Finding (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

OpenWHO: A Document-Level Parallel Corpus for Health Translation in Low-Resource Languages

Merx, Raphaël, Suominen, Hanna, Cohn, Trevor, Vylomova, Ekaterina

arXiv.org Artificial IntelligenceOct-7-2025

In machine translation (MT), health is a high-stakes domain characterised by widespread deployment and domain-specific vocabulary. However, there is a lack of MT evaluation datasets for low-resource languages in this domain. To address this gap, we introduce OpenWHO, a document-level parallel corpus of 2,978 documents and 26,824 sentences from the World Health Organization's e-learning platform. Sourced from expert-authored, professionally translated materials shielded from web-crawling, OpenWHO spans a diverse range of over 20 languages, of which nine are low-resource. Leveraging this new resource, we evaluate modern large language models (LLMs) against traditional MT models. Our findings reveal that LLMs consistently outperform traditional MT models, with Gemini 2.5 Flash achieving a +4.79 ChrF point improvement over NLLB-54B on our low-resource test set. Further, we investigate how LLM context utilisation affects accuracy, finding that the benefits of document-level translation are most pronounced in specialised domains like health. We release the OpenWHO corpus to encourage further research into low-resource MT in the health domain.

large language model, machine learning, translation, (19 more...)

arXiv.org Artificial Intelligence

2508.16048

Country:

Europe (1.00)
North America > United States (0.46)
North America > Mexico (0.28)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry:

Health & Medicine > Public Health (0.93)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.68)
Education > Educational Setting > Online (0.48)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

How AI and Wikipedia have sent vulnerable languages into a doom spiral

MIT Technology ReviewSep-25-2025, 09:00:00 GMT

Machine translators have made it easier than ever to create error-plagued Wikipedia articles in obscure languages. What happens when AI models get trained on junk pages? When Kenneth Wehr started managing the Greenlandic-language version of Wikipedia four years ago, his first act was to delete almost everything. It had to go, he thought, if it had any chance of surviving. Wehr, who's 26, isn't from Greenland--he grew up in Germany--but he had become obsessed with the island, an autonomous Danish territory, after visiting as a teenager. He'd spent years writing obscure Wikipedia articles in his native tongue on virtually everything to do with it. He even ended up moving to Copenhagen to study Greenlandic, a language spoken by some 57,000 mostly Indigenous Inuit people scattered across dozens of far-flung Arctic villages. The Greenlandic-language edition was added to Wikipedia around 2003, just a few years after the site launched in English. By the time Wehr took its helm nearly 20 years later, hundreds of Wikipedians had contributed to it and had collectively written some 1,500 articles totaling over tens of thousands of words.

google translate, translation, wikipedia, (15 more...)

MIT Technology Review

Country:

North America > Greenland (0.24)
Europe > Germany (0.24)
Europe > Denmark > Capital Region > Copenhagen (0.24)
(8 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Specification-Aware Machine Translation and Evaluation for Purpose Alignment

Kayano, Yoko, Sugawara, Saku

arXiv.org Artificial IntelligenceSep-23-2025

In professional settings, translation is guided by communicative goals and client needs, often formalized as specifications. While existing evaluation frameworks acknowledge the importance of such specifications, these specifications are often treated only implicitly in machine translation (MT) research. Drawing on translation studies, we provide a theoretical rationale for why specifications matter in professional translation, as well as a practical guide to implementing specification-aware MT and evaluation. Building on this foundation, we apply our framework to the translation of investor relations texts from 33 publicly listed companies. In our experiment, we compare five translation types, including official human translations and prompt-based outputs from large language models (LLMs), using expert error analysis, user preference rankings, and an automatic metric. The results show that LLM translations guided by specifications consistently outperformed official human translations in human evaluations, highlighting a gap between perceived and expected quality. These findings demonstrate that integrating specifications into MT workflows, with human oversight, can improve translation quality in ways aligned with professional practice.

large language model, machine learning, translation, (18 more...)

arXiv.org Artificial Intelligence

2509.17559

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation (0.93)
Banking & Finance > Trading (0.93)
Automobiles & Trucks (0.92)
Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

PDFMathTranslate: Scientific Document Translation Preserving Layouts

Ouyang, Rongxin, Chu, Chang, Xin, Zhikuang, Ma, Xiangyao

arXiv.org Artificial IntelligenceSep-23-2025

Language barriers in scientific documents hinder the diffusion and development of science and technologies. However, prior efforts in translating such documents largely overlooked the information in layouts. To bridge the gap, we introduce PDFMathTranslate, the world's first open-source software for translating scientific documents while preserving layouts. Leveraging the most recent advances in large language models and precise layout detection, we contribute to the community with key improvements in precision, flexibility, and efficiency. The work has been open-sourced at https://github.com/byaidu/pdfmathtranslate with more than 222k downloads.

large language model, machine learning, translation, (18 more...)

arXiv.org Artificial Intelligence

2507.03009

Country: Asia > China (0.29)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Gender Bias in English-to-Greek Machine Translation

Gkovedarou, Eleni, Daems, Joke, De Bruyne, Luna

arXiv.org Artificial IntelligenceAug-22-2025

As the demand for inclusive language increases, concern has grown over the susceptibility of machine translation (MT) systems to reinforce gender stereotypes. This study investigates gender bias in two commercial MT systems, Google Translate and DeepL, focusing on the understudied English-to-Greek language pair. We address three aspects of gender bias: i) male bias, ii) occupational stereotyping, and iii) errors in anti-stereotypical translations. Additionally, we explore the potential of prompted GPT-4o as a bias mitigation tool that provides both gender-explicit and gender-neutral alternatives when necessary. To achieve this, we introduce GendEL, a manually crafted bilingual dataset of 240 gender-ambiguous and unambiguous sentences that feature stereotypical occupational nouns and adjectives. We find persistent gender bias in translations by both MT systems; while they perform well in cases where gender is explicitly defined, with DeepL outperforming both Google Translate and GPT-4o in feminine gender-unambiguous sentences, they are far from producing gender-inclusive or neutral translations when the gender is unspecified. GPT-4o shows promise, generating appropriate gendered and neutral alternatives for most ambiguous cases, though residual biases remain evident.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2506.09558

Country:

Europe (1.00)
North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ArzEn-MultiGenre: An aligned parallel dataset of Egyptian Arabic song lyrics, novels, and subtitles, with English translations

Al-Sabbagh, Rania

arXiv.org Artificial IntelligenceAug-5-2025

This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/) 2 R. Al-Sabbagh / Data in Brief 54 (2024) 1 10271 Subject Computer Science, Social Sciences Specific subject area Natural Language Processing, machine translation, large-language models, translation studies, cross-linguistic analysis, lexical semantics Data format Translated and aligned Type of data Texts (Bilingual tables in Microsoft Excel files) Data collection The ArzEn-MultiGenre dataset consists of three genres: song lyrics, novels, and subtitles. The data was gathered from various sources using different methods. A website was crawled for song lyrics using an in-house web crawler, and professional translators manually translated the lyrics into English. For novels, hard copies were collected in English and Egyptian Arabic, then scanned and converted into text files using an Optical Character Recognizer (OCR). The OCR output was then manually reviewed and aligned.

artificial intelligence, machine translation, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.dib.2024.110271

2508.01411

Country: Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Mitigating Stylistic Biases of Machine Translation Systems via Monolingual Corpora Only

Gao, Xuanqi, Jiang, Weipeng, Zhai, Juan, Ma, Shiqing, Xie, Siyi, Yin, Xinyang, Shen, Chao

arXiv.org Artificial IntelligenceJul-21-2025

The advent of neural machine translation (NMT) has revolutionized cross-lingual communication, yet preserving stylistic nuances remains a significant challenge. While existing approaches often require parallel corpora for style preservation, we introduce Babel, a novel framework that enhances stylistic fidelity in NMT using only monolingual corpora. Babel employs two key components: (1) a style detector based on contextual embeddings that identifies stylistic disparities between source and target texts, and (2) a diffusion-based style applicator that rectifies stylistic inconsistencies while maintaining semantic integrity. Our framework integrates with existing NMT systems as a post-processing module, enabling style-aware translation without requiring architectural modifications or parallel stylistic data. Extensive experiments on five diverse domains (law, literature, scientific writing, medicine, and educational content) demonstrate Babel's effectiveness: it identifies stylistic inconsistencies with 88.21% precision and improves stylistic preservation by 150% while maintaining a high semantic similarity score of 0.92. Human evaluation confirms that translations refined by Babel better preserve source text style while maintaining fluency and adequacy.

machine learning, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2507.13395

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Information Technology > Security & Privacy (0.93)
Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Evaluating Machine Translation Models for English-Hindi Language Pairs: A Comparative Analysis

Shetty, Ahan Prasannakumar

arXiv.org Artificial IntelligenceMay-27-2025

--Machine translation has become a critical tool in bridging linguistic gaps, especially between languages as diverse as English and Hindi. This paper comprehensively evaluates various machine translation models for translating between English and Hindi. We assess the performance of these models using a diverse set of automatic evaluation metrics, both lexical and machine learning-based metrics. The study aims to provide insights into the effectiveness of different machine translation approaches in handling both general and specialized language domains. Results indicate varying performance levels across different metrics, highlighting strengths and areas for improvement in current translation systems.

artificial intelligence, natural language, translation, (13 more...)

arXiv.org Artificial Intelligence

2505.19604

Country: Asia > India (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.47)
Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback