AITopics | different language

Collaborating Authors

different language

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EMMA-X: An EM-like Multilingual Pre-training Algorithm for Cross-lingual Representation Learning

Neural Information Processing SystemsApr-25-2026, 18:37:27 GMT

Expressing universal semantics common to all languages is helpful in understanding the meanings of complex and culture-specific sentences. The research theme underlying this scenario focuses on learning universal representations across languages with the usage of massive parallel corpora. However, due to the sparsity and scarcity of parallel data, there is still a big challenge in learning authentic "universals" for any two languages. In this paper, we propose EMMA-X: an EM-like Multilingual pre-training Algorithm, to learn (X)Cross-lingual universals with the aid of excessive multilingual non-parallel data.

computational linguistic, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > Minnesota (0.28)

Industry:

Government (0.67)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

8eb88844dafefa92a26aaec9f3acad93-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-16-2026, 14:15:33 GMT

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > South Korea (0.14)
North America > United States (0.14)
Asia > China (0.05)
(16 more...)

Genre: Collection (0.40)

Industry:

Leisure & Entertainment (0.68)
Education > Health & Safety > School Nutrition (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

8eb88844dafefa92a26aaec9f3acad93-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-16-2026, 14:15:29 GMT

Ideally,languagemodelswould reflect the cultural norms of various regions around the world and generate culturally appropriate content when responding inlocallanguages oftheregions, unless otherwise specified.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

Asia > South Korea (0.14)
Asia > North Korea (0.04)
North America > Mexico (0.04)
(20 more...)

Genre: Research Report (0.68)

Industry:

Education (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

201408406e0c5cf7626c4baeae6eaadd-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 18:53:10 GMT

computational linguistic, proceedings, representation, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(21 more...)

Industry:

Government (0.67)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Appendix

Neural Information Processing SystemsFeb-8-2026, 08:06:05 GMT

We limit the target languages for this augmentation process to Arabic, Finnish, Japanese, Korean, Russian, Spanish, Swedish, Hebrew, Thai,Danish,French,Italian,Dutch,Polish,andPortuguese. Interestingly,justaddingthislanguage code effectively changes the outputs as shown in Table 7. We further subsample 50% of the synthetically generated questions. During inference, we first retrieve top 15 passages using mDPR, and then feed the questions andconcatenated passages intothemGEN model, withlanguage tags. The gray dots concentrated in the lower right part in the first figure represent encoded Thai embeddings.

artificial intelligence, state-of-the-art model, trans, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Technology: Information Technology > Artificial Intelligence (0.69)

Add feedback

A Novel Two-Step Method for Cross Language Representation Learning

Neural Information Processing SystemsDec-27-2025, 15:28:43 GMT

Cross language text classification is an important learning task in natural language processing. A critical challenge of cross language learning lies in that words of different languages are in disjoint feature spaces. In this paper, we propose a two-step representation learning method to bridge the feature spaces of different languages by exploiting a set of parallel bilingual documents. Specifically, we first formulate a matrix completion problem to produce a complete parallel document-term matrix for all documents in two languages, and then induce a cross-lingual document representation by applying latent semantic indexing on the obtained matrix. We use a projected gradient descent algorithm to solve the formulated matrix completion problem with convergence guarantees. The proposed approach is evaluated by conducting a set of experiments with cross language sentiment classification tasks on Amazon product reviews. The experimental results demonstrate that the proposed learning approach outperforms a number of comparison cross language representation learning methods, especially when the number of parallel bilingual documents is small.

cross language representation learning, name change, novel two-step method, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.61)

Add feedback

Language Model Tokenizers Introduce Unfairness Between Languages

Neural Information Processing SystemsDec-26-2025, 02:11:49 GMT

Recent language models have shown impressive multilingual performance, even when not explicitly trained for it.Despite this, there are concerns about the quality of their outputs across different languages.In this paper, we show how disparity in the treatment of different languages arises at the tokenization stage, well before a model is even invoked.The same text translated into different languages can have drastically different tokenization lengths, with differences up to 15 times in some cases.These disparities persist even for tokenizers that are intentionally trained for multilingual support.Character-level and byte-level models also exhibit over 4 times the difference in the encoding length for some language pairs.This induces unfair treatment for some language communities in regard to the cost of accessing commercial language services, the processing time and latency, as well as the amount of content that can be provided as context to the models.Therefore, we make the case that we should train future language models using multilingually fair subword tokenizers.

electronic proceedings, language model tokenizer introduce unfairness, name change, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.68)

Add feedback

On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation

Qi, Jirui, Fernández, Raquel, Bisazza, Arianna

arXiv.org Artificial IntelligenceDec-12-2025

Retrieval-augmented generation (RAG) with large language models (LLMs) has demonstrated strong performance in multilingual question-answering (QA) tasks by leveraging relevant passages retrieved from corpora. In multilingual RAG (mRAG), the retrieved passages can be written in languages other than that of the query entered by the user, making it challenging for LLMs to effectively utilize the provided information. Recent research suggests that retrieving passages from multilingual corpora can improve RAG performance, particularly for low-resource languages. However, the extent to which LLMs can leverage different kinds of multilingual contexts to generate accurate answers, *independently from retrieval quality*, remains understudied. In this paper, we conduct an extensive assessment of LLMs' ability to (i) make consistent use of a relevant passage regardless of its language, (ii) respond in the expected language, and (iii) focus on the relevant passage even when multiple `distracting' passages in different languages are provided in the context. Our experiments with four LLMs across three QA datasets covering a total of 48 languages reveal a surprising ability of LLMs to extract the relevant information from passages in a different language than the query, but a much weaker ability to formulate a full answer in the correct language. Our analysis, based on both accuracy and feature attribution techniques, further shows that distracting passages negatively impact answer quality regardless of their language. However, distractors in the query language exert a slightly stronger influence. Taken together, our findings deepen the understanding of how LLMs utilize context in mRAG systems, providing directions for future improvements.

computational linguistic, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2025.mrl-main.15

2504.00597

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.93)

Genre: Research Report > New Finding (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Leisure & Entertainment > Sports > Football (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Multilingual Pretraining for Pixel Language Models

Kesen, Ilker, Lotz, Jonas F., Ziegler, Ingo, Rust, Phillip, Elliott, Desmond

arXiv.org Artificial IntelligenceDec-3-2025

Pixel language models operate directly on images of rendered text, eliminating the need for a fixed vocabulary. While these models have demonstrated strong capabilities for downstream cross-lingual transfer, multilingual pretraining remains underexplored. We introduce PIXEL-M4, a model pretrained on four visually and linguistically diverse languages: English, Hindi, Ukrainian, and Simplified Chinese. Multilingual evaluations on semantic and syntactic tasks show that PIXEL-M4 outperforms an English-only counterpart on non-Latin scripts. Word-level probing analyses confirm that PIXEL-M4 captures rich linguistic features, even in languages not seen during pretraining. Furthermore, an analysis of its hidden representations shows that multilingual pretraining yields a semantic embedding space closely aligned across the languages used for pretraining. This work demonstrates that multilingual pretraining substantially enhances the capability of pixel language models to effectively support a diverse set of languages.

artificial intelligence, natural language, pixel, (15 more...)

arXiv.org Artificial Intelligence

2505.21265

Country: