AITopics | word translation

Collaborating Authors

word translation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Accessing Higher Dimensions for Unsupervised Word Translation Sida I. Wang FAIR, Meta

Neural Information Processing SystemsFeb-17-2026, 10:56:19 GMT

With coocmap, 10-40MB of text data and a few minutes of CPU time is sufficient to achieve unsupervised word translation if the training corpora are in the same domain (e.g. both on

dimension, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
Asia > Middle East > Jordan (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models

Chang, Emily, Bafna, Niyati

arXiv.org Artificial IntelligenceDec-1-2025

Existing benchmarks for large language models (LLMs) are largely restricted to high- or mid-resource languages, and often evaluate performance on higher-order tasks in reasoning and generation. However, plenty of evidence points to the fact that LLMs lack basic linguistic competence in the vast majority of the world's 3800+ written languages. We introduce ChiKhaPo, consisting of 8 subtasks of varying difficulty designed to evaluate the lexical comprehension and generation abilities of generative models. ChiKhaPo draws on existing lexicons, monolingual data, and bitext, and provides coverage for 2700+ languages for 2 subtasks, surpassing any existing benchmark in terms of language coverage. We further show that 6 SOTA models struggle on our benchmark, and discuss the factors contributing to performance scores, including language family, language resourcedness, task, and comprehension versus generation directions. With ChiKhaPo, we hope to enable and encourage the massively multilingual benchmarking of LLMs.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.16928

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.67)

Genre: Research Report (0.63)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Accessing Higher Dimensions for Unsupervised Word Translation

Neural Information Processing SystemsOct-9-2025, 09:01:31 GMT

With coocmap, 10-40MB of text data and a few minutes of CPU time is sufficient to achieve unsupervised word translation if the training corpora are in the same domain (e.g. both on

dimension, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
Asia > Middle East > Jordan (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Data Augmentation for End-to-end Code-switching Speech Recognition

Du, Chenpeng, Li, Hao, Lu, Yizhou, Wang, Lan, Qian, Yanmin

arXiv.org Artificial IntelligenceNov-3-2024

Training a code-switching end-to-end automatic speech recognition (ASR) model normally requires a large amount of data, while code-switching data is often limited. In this paper, three novel approaches are proposed for code-switching data augmentation. Specifically, they are audio splicing with the existing code-switching data, and TTS with new code-switching texts generated by word translation or word insertion. Our experiments on 200 hours Mandarin-English code-switching dataset show that all the three proposed approaches yield significant improvements on code-switching ASR individually. Moreover, all the proposed approaches can be combined with recent popular SpecAugment, and an addition gain can be obtained. WER is significantly reduced by relative 24.0% compared to the system without any data augmentation, and still relative 13.0% gain compared to the system with only SpecAugment

augmentation, speech recognition, utterance, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/slt48900.2021.9383620

2011.0216

Country:

Asia > China > Shanghai > Shanghai (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Asia > Nepal > Bagmati Province > Kathmandu District > Kathmandu (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning Translations via Matrix Completion

Wijaya, Derry, Callahan, Brendan, Hewitt, John, Gao, Jie, Ling, Xiao, Apidianaki, Marianna, Callison-Burch, Chris

arXiv.org Artificial IntelligenceJun-19-2024

Bilingual Lexicon Induction is the task of learning word translations without bilingual parallel corpora. We model this task as a matrix completion problem, and present an effective and extendable framework for completing the matrix. This method harnesses diverse bilingual and monolingual signals, each of which may be incomplete or noisy. Our model achieves state-of-the-art performance for both high and low resource languages.

computational linguistic, proceedings, translation, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/D17-1152

2406.13195

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Oregon > Multnomah County > Portland (0.04)
Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

ProMap: Effective Bilingual Lexicon Induction via Language Model Prompting

Mekki, Abdellah El, Abdul-Mageed, Muhammad, Nagoudi, ElMoatez Billah, Berrada, Ismail, Khoumsi, Ahmed

arXiv.org Artificial IntelligenceOct-28-2023

Bilingual Lexicon Induction (BLI), where words are translated between two languages, is an important NLP task. While noticeable progress on BLI in rich resource languages using static word embeddings has been achieved. The word translation performance can be further improved by incorporating information from contextualized word embeddings. In this paper, we introduce ProMap, a novel approach for BLI that leverages the power of prompting pretrained multilingual and multidialectal language models to address these challenges. To overcome the employment of subword tokens in these models, ProMap relies on an effective padded prompting of language models with a seed dictionary that achieves good performance when used independently. We also demonstrate the effectiveness of ProMap in re-ranking results from other BLI methods such as with aligned static word embeddings. When evaluated on both rich-resource and low-resource languages, ProMap consistently achieves state-of-the-art results. Furthermore, ProMap enables strong performance in few-shot scenarios (even with less than 10 training examples), making it a valuable tool for low-resource language translation. Overall, we believe our method offers both exciting and promising direction for BLI in general and low-resource languages in particular. ProMap code and data are available at \url{https://github.com/4mekki4/promap}.

computational linguistic, translation, word translation, (14 more...)

arXiv.org Artificial Intelligence

2310.18778

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(11 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.55)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.34)

Add feedback

Accessing Higher Dimensions for Unsupervised Word Translation

Wang, Sida I.

arXiv.org Artificial IntelligenceMay-23-2023

The striking ability of unsupervised word translation has been demonstrated with the help of word vectors / pretraining; however, they require large amounts of data and usually fails if the data come from different domains. We propose coocmap, a method that can use either high-dimensional co-occurrence counts or their lower-dimensional approximations. Freed from the limits of low dimensions, we show that relying on low-dimensional vectors and their incidental properties miss out on better denoising methods and useful world knowledge in high dimensions, thus stunting the potential of the data. Our results show that unsupervised translation can be achieved more easily and robustly than previously thought -- less than 80MB and minutes of CPU time is required to achieve over 50\% accuracy for English to Finnish, Hungarian, and Chinese translations when trained on similar data; even under domain mismatch, we show coocmap still works fully unsupervised on English NewsCrawl to Chinese Wikipedia and English Europarl to Spanish Wikipedia, among others. These results challenge prevailing assumptions on the necessity and superiority of low-dimensional vectors, and suggest that similarly processed co-occurrences can outperform dense vectors on other tasks too.

dimension, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.142

Country:

North America > United States > Oregon > Multnomah County > Portland (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > South Korea > Gangwon-do > Pyeongchang (0.04)
(10 more...)

Genre: Research Report > New Finding (0.86)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TransDocs: Optical Character Recognition with word to word translation

Bamotra, Abhishek, Uppala, Phani Krishna

arXiv.org Artificial IntelligenceApr-15-2023

While OCR has been used in various applications, its output is not always accurate, leading to misfit words. This research work focuses on improving the optical character recognition (OCR) with ML techniques with integration of OCR with long short-term memory (LSTM) based sequence to sequence deep learning models to perform document translation. This work is based on ANKI dataset for English to Spanish translation. In this work, I have shown comparative study for pre-trained OCR while using deep learning model using LSTM-based seq2seq architecture with attention for machine translation. End-to-end performance of the model has been expressed in BLEU-4 score. This research paper is aimed at researchers and practitioners interested in OCR and its applications in document translation.

artificial intelligence, machine learning, translation, (18 more...)

arXiv.org Artificial Intelligence

2304.07637

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment

Dinh, Tuan, Sohn, Jy-yong, Rajput, Shashank, Ossowski, Timothy, Ming, Yifei, Hu, Junjie, Papailiopoulos, Dimitris, Lee, Kangwook

arXiv.org Artificial IntelligenceNov-7-2022

Word translation without parallel corpora has become feasible, rivaling the performance of supervised methods. Recent findings have shown that the accuracy and robustness of unsupervised word translation (UWT) can be improved by making use of visual observations, which are universal representations across languages. In this work, we investigate the potential of using not only visual observations but also pretrained language-image models for enabling a more efficient and robust UWT. Specifically, we develop a novel UWT method dubbed Word Alignment using Language-Image Pretraining (WALIP), which leverages visual observations via the shared embedding space of images and texts provided by CLIP models (Radford et al., 2021). WALIP has a two-step procedure. First, we retrieve word pairs with high confidences of similarity, computed using our proposed image-based fingerprints, which define the initial pivot for the word alignment. Second, we apply our robust Procrustes algorithm to estimate the linear mapping between two embedding spaces, which iteratively corrects and refines the estimated alignment. Our extensive experiments show that WALIP improves upon the state-of-the-art performance of bilingual word alignment for a few language pairs across different word embeddings and displays great robustness to the dissimilarity of language pairs or training corpora for two word embeddings.

machine learning, natural language, walip, (20 more...)

arXiv.org Artificial Intelligence

2205.11616

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Google Translate Reveals Cultural Bias

#artificialintelligenceNov-27-2021, 22:10:12 GMT

Let's be honest, all language learners have turned to Google Translate to brush up on vocabulary, verify their work, or complete a class assignment. We probably lean a little too much on the application, at least according to many language teachers, considering the inherent faults and bias can be found in the translated phrases. Countless videos and articles have been uploaded to the internet showing how a few simple English sentences were mangled after running them through the translator like the worlds most convoluted game of telephone. Yet, the convenience of Google's online translator never fails to draw us back. One source of faults between language translations arise from a globally common history of male-dominated society and is further exacerbated by the recent movement toward more inclusive language for gender nonconforming individuals.

google, google translate, translation, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback