AITopics | lang

Collaborating Authors

lang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

New mpox strain identified in England

BBC NewsDec-8-2025, 14:13:27 GMT

A new strain of mpox, previously called monkeypox, has been detected in a person in England, say UK health officials. The virus is a mix of two major types of the mpox virus, and was found in someone who recently returned from travelling in Asia. Officials say they are still assessing the significance of the new strain. The UK Health Security Agency (UKHSA) says it is normal for viruses to evolve. Getting vaccinated remains the best way to protect against severe disease - although an mpox infection is mild for many.

mpox, new mpox strain, virus, (14 more...)

BBC News

Country:

North America > United States (0.16)
North America > Central America (0.15)
Oceania > Australia (0.06)
(15 more...)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology: Information Technology > Artificial Intelligence (0.49)

Add feedback

Déréverbération non-supervisée de la parole par modèle hybride

Bahrman, Louis, Fontaine, Mathieu, Richard, Gaël

arXiv.org Artificial IntelligenceOct-13-2025

This paper introduces a new training strategy to improve speech dereverberation systems in an unsupervised manner using only reverberant speech. Most existing algorithms rely on paired dry/reverberant data, which is difficult to obtain. Our approach uses limited acoustic information, like the reverberation time (RT60), to train a dereverberation system. Experimental results demonstrate that our method achieves more consistent performance across various objective metrics than the state-of-the-art.

artificial intelligence, machine learning, ration, (19 more...)

arXiv.org Artificial Intelligence

2510.09025

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

93a27b0bd99bac3e68a440b48aa421ab-Supplemental.pdf

Neural Information Processing SystemsAug-16-2025, 01:52:23 GMT

artificial intelligence, derivative, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.47)

Add feedback

FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation

Zhu, Shaolin, Dong, Tianyu, Li, Bo, Xiong, Deyi

arXiv.org Artificial IntelligenceMay-21-2025

In this paper, we present FuxiMT, a novel Chinese-centric multilingual machine translation model powered by a sparsified large language model (LLM). We adopt a two-stage strategy to train FuxiMT. We first pre-train the model on a massive Chinese corpus and then conduct multilingual fine-tuning on a large parallel dataset encompassing 65 languages. FuxiMT incorporates Mixture-of-Experts (MoEs) and employs a curriculum learning strategy for robust performance across various resource levels. Experimental results demonstrate that FuxiMT significantly outperforms strong baselines, including state-of-the-art LLMs and machine translation models, particularly under low-resource scenarios. Furthermore, FuxiMT exhibits remarkable zero-shot translation capabilities for unseen language pairs, indicating its potential to bridge communication gaps where parallel data are scarce or unavailable.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.14256

Country:

Europe (1.00)
Asia > China (0.67)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts

Li, Zhongyang, Li, Ziyue, Zhou, Tianyi

arXiv.org Artificial IntelligenceFeb-28-2025

In large multimodal models (LMMs), the perception of non-language modalities (e.g., visual representations) is usually not on par with the large language models (LLMs)' powerful reasoning capabilities, deterring LMMs' performance on challenging downstream tasks. This weakness has been recently mitigated by replacing the vision encoder with a mixture-of-experts (MoE), which provides rich, multi-granularity, and diverse representations required by diverse downstream tasks. The performance of multimodal MoE largely depends on its router, which reweights and mixes the representations of different experts for each input. However, we find that the end-to-end trained router does not always produce the optimal routing weights for every test sample. To bridge the gap, we propose a novel and efficient method "Re-Routing in Test-Time (R2-T2)" that locally optimizes the vector of routing weights in test-time by moving it toward those vectors of the correctly predicted samples in a neighborhood of the test sample. We propose three R2-T2 strategies with different optimization objectives and neighbor-search spaces. R2-T2 consistently and greatly improves state-of-the-art LMMs' performance on challenging benchmarks of diverse tasks, without training any base-model parameters.

arxiv preprint arxiv, benchmark, re-routing, (14 more...)

arXiv.org Artificial Intelligence

2502.20395

Country:

North America > United States > California (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
(2 more...)

Genre: Research Report > New Finding (0.93)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
(2 more...)

Add feedback

A Hybrid Model for Weakly-Supervised Speech Dereverberation

Bahrman, Louis, Fontaine, Mathieu, Richard, Gael

arXiv.org Artificial IntelligenceFeb-6-2025

This paper introduces a new training strategy to improve speech dereverberation systems using minimal acoustic information and reverberant (wet) speech. Most existing algorithms rely on paired dry/wet data, which is difficult to obtain, or on target metrics that may not adequately capture reverberation characteristics and can lead to poor results on non-target metrics. Our approach uses limited acoustic information, like the reverberation time (RT60), to train a dereverberation system. The system's output is resynthesized using a generated room impulse response and compared with the original reverberant speech, providing a novel reverberation matching loss replacing the standard target metrics. During inference, only the trained dereverberation model is used. Experimental results demonstrate that our method achieves more consistent performance across various objective metrics used in speech dereverberation than the state-of-the-art.

artificial intelligence, machine learning, supervision, (14 more...)

arXiv.org Artificial Intelligence

2502.06839

Country:

Asia (0.14)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
Europe > France (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech (0.71)

Add feedback

MIT-10M: A Large Scale Parallel Corpus of Multilingual Image Translation

Li, Bo, Zhu, Shaolin, Wen, Lijie

arXiv.org Artificial IntelligenceDec-16-2024

Image Translation (IT) holds immense potential across diverse domains, enabling the translation of textual content within images into various languages. However, existing datasets often suffer from limitations in scale, diversity, and quality, hindering the development and evaluation of IT models. To address this issue, we introduce MIT-10M, a large-scale parallel corpus of multilingual image translation with over 10M image-text pairs derived from real-world data, which has undergone extensive data cleaning and multilingual translation validation. It contains 840K images in three sizes, 28 categories, tasks with three levels of difficulty and 14 languages image-text pairs, which is a considerable improvement on existing datasets. We conduct extensive experiments to evaluate and train models on MIT-10M. The experimental results clearly indicate that our dataset has higher adaptability when it comes to evaluating the performance of the models in tackling challenging and complex image translation tasks in the real world. Moreover, the performance of the model fine-tuned with MIT-10M has tripled compared to the baseline model, further confirming its superiority.

dataset, mit-10m, translation, (15 more...)

arXiv.org Artificial Intelligence

2412.07147

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Beijing > Beijing (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
(5 more...)

Add feedback

Crowdsourcing Lexical Diversity

Khalilia, Hadi, Otterbacher, Jahna, Bella, Gabor, Noortyani, Rusma, Darma, Shandy, Giunchiglia, Fausto

arXiv.org Artificial IntelligenceOct-30-2024

Lexical-semantic resources (LSRs), such as online lexicons or wordnets, are fundamental for natural language processing applications. In many languages, however, such resources suffer from quality issues: incorrect entries, incompleteness, but also, the rarely addressed issue of bias towards the English language and Anglo-Saxon culture. Such bias manifests itself in the absence of concepts specific to the language or culture at hand, the presence of foreign (Anglo-Saxon) concepts, as well as in the lack of an explicit indication of untranslatability, also known as cross-lingual \emph{lexical gaps}, when a term has no equivalent in another language. This paper proposes a novel crowdsourcing methodology for reducing bias in LSRs. Crowd workers compare lexemes from two languages, focusing on domains rich in lexical diversity, such as kinship or food. Our LingoGap crowdsourcing tool facilitates comparisons through microtasks identifying equivalent terms, language-specific terms, and lexical gaps across languages. We validated our method by applying it to two case studies focused on food-related terminology: (1) English and Arabic, and (2) Standard Indonesian and Banjarese. These experiments identified 2,140 lexical gaps in the first case study and 951 in the second. The success of these experiments confirmed the usability of our method and tool for future large-scale lexicon enrichment tasks.

asian low-resour, experiment, lexical gap, (14 more...)

arXiv.org Artificial Intelligence

2410.23133

Country:

Europe > United Kingdom > UK North Sea (0.05)
Atlantic Ocean > North Atlantic Ocean > North Sea > UK North Sea (0.05)
Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
(30 more...)

Genre: Research Report > New Finding (0.66)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
(2 more...)

Add feedback

Multilingual Hallucination Gaps in Large Language Models

Chataigner, Cléa, Taïk, Afaf, Farnadi, Golnoosh

arXiv.org Artificial IntelligenceOct-23-2024

Large language models (LLMs) are increasingly used as alternatives to traditional search engines given their capacity to generate text that resembles human language. However, this shift is concerning, as LLMs often generate hallucinations, misleading or false information that appears highly credible. In this study, we explore the phenomenon of hallucinations across multiple languages in freeform text generation, focusing on what we call multilingual hallucination gaps. These gaps reflect differences in the frequency of hallucinated answers depending on the prompt and language used. To quantify such hallucinations, we used the FactScore metric and extended its framework to a multilingual setting. We conducted experiments using LLMs from the LLaMA, Qwen, and Aya families, generating biographies in 19 languages and comparing the results to Wikipedia pages. Our results reveal variations in hallucination rates, especially between high and low resource languages, raising important questions about LLM multilingual performance and the challenges in evaluating hallucinations in multilingual freeform text generation.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.1827

Country:

North America > United States (0.14)
North America > Canada > Quebec > Montreal (0.14)
Asia > Singapore (0.04)
(18 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Mitigating Semantic Leakage in Cross-lingual Embeddings via Orthogonality Constraint

Ki, Dayeon, Park, Cheonbok, Kim, Hyunjoong

arXiv.org Artificial IntelligenceSep-23-2024

Accurately aligning contextual representations in cross-lingual sentence embeddings is key for effective parallel data mining. A common strategy for achieving this alignment involves disentangling semantics and language in sentence embeddings derived from multilingual pre-trained models. However, we discover that current disentangled representation learning methods suffer from semantic leakage - a term we introduce to describe when a substantial amount of language-specific information is unintentionally leaked into semantic representations. This hinders the effective disentanglement of semantic and language representations, making it difficult to retrieve embeddings that distinctively represent the meaning of the sentence. To address this challenge, we propose a novel training objective, ORthogonAlity Constraint LEarning (ORACLE), tailored to enforce orthogonality between semantic and language embeddings. ORACLE builds upon two components: intra-class clustering and inter-class separation. Through experiments on cross-lingual retrieval and semantic textual similarity tasks, we demonstrate that training with the ORACLE objective effectively reduces semantic leakage and enhances semantic alignment within the embedding space.

lang, oracle, representation, (14 more...)

arXiv.org Artificial Intelligence

2409.15664

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback