AITopics

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsMar-21-2026, 17:06:22 GMT

Conjugated Semantic Pool Improves OOD Detection with Pre-trained Vision-Language Models

A straightforward pipeline for zero-shot out-of-distribution (OOD) detection involves selecting potential OOD labels from an extensive semantic pool and then leveraging a pre-trained vision-language model to perform classification on both in-distribution (ID) and OOD labels. In this paper, we theorize that enhancing performance requires expanding the semantic pool, while increasing the expected probability of selected OOD labels being activated by OOD samples, and ensuring low mutual dependence among the activations of these OOD labels. A natural expansion manner is to adopt a larger lexicon; however, the inevitable introduction of numerous synonyms and uncommon words fails to meet the above requirements, indicating that viable expansion manners move beyond merely selecting words from a lexicon. Since OOD detection aims to correctly classify input images into ID/OOD class groups, we can make up OOD label candidates which are not standard class names but beneficial for the process. Observing that the original semantic pool is comprised of unmodified specific class names, we correspondingly construct a conjugated semantic pool (CSP) consisting of modified superclass names, each serving as a cluster center for samples sharing similar properties across different categories. Consistent with our established theory, expanding OOD label candidates with the CSP satisfies the requirements and outperforms existing works by 7.89% in FPR95.

artificial intelligence, natural language, proceedings, (7 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.62)

Neural Information Processing SystemsFeb-9-2026, 15:25:44 GMT

2548fbe155ed2405488b7d5373013a64-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (19 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
North America > Canada > Quebec > Montreal (0.04)
(5 more...)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Neural Information Processing SystemsFeb-8-2026, 13:05:59 GMT

4b26dc4663ccf960c8538d595d0a1d3a-Supplemental.pdf

emote, experiment, explanation, (17 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-24-2025, 04:51:28 GMT

AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation

Open-vocabulary semantic segmentation is a challenging task that requires segmenting novel object categories at inference time. Recent works explore vision-language pre-training to handle this task, but suffer from unrealistic assumptions in practical scenarios, i.e., low-quality textual category names.For example, this paradigm assumes that new textual categories will be accurately and completely provided, and exist in lexicons during pre-training.However, exceptions often happen when meet with ambiguity for brief or incomplete names, new words that are not present in the pre-trained lexicons, and difficult-to-describe categories for users.To address these issues, this work proposes a novel framework, AttrSeg

attribute decomposition-aggregation, category, open-vocabulary semantic segmentation, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.61)

arXiv.org Artificial IntelligenceDec-3-2025

TriLex: A Framework for Multilingual Sentiment Analysis in Low-Resource South African Languages

Nkongolo, Mike, Vorster, Hilton, Warren, Josh, Naick, Trevor, Vanmali, Deandre, Mashapha, Masana, Brand, Luke, Fernandes, Alyssa, Calitz, Janco, Makhoba, Sibusiso

Low-resource African languages remain underrepresented in sentiment analysis research, resulting in limited lexical resources and reduced model performance in multilingual applications. This gap restricts equitable access to Natural Language Processing (NLP) technologies and hinders downstream tasks such as public-health monitoring, digital governance, and financial inclusion. To address this challenge, this paper introduces TriLex, a three-stage retrieval-augmented framework that integrates corpus-based extraction, cross-lingual mapping, and Retrieval-Augmented Generation (RAG) driven lexicon refinement for scalable sentiment lexicon expansion in low-resource languages. Using an expanded lexicon, we evaluate two leading African language models (AfroXLMR and AfriBERTa) across multiple case studies. Results show that AfroXLMR consistently achieves the strongest performance, with F1-scores exceeding 80% for isiXhosa and isiZulu, aligning with previously reported ranges (71-75%), and demonstrating high multilingual stability with narrow confidence intervals. AfriBERTa, despite lacking pre-training on the target languages, attains moderate but reliable F1-scores around 64%, confirming its effectiveness under constrained computational settings. Comparative analysis shows that both models outperform traditional machine learning baselines, while ensemble evaluation combining AfroXLMR variants indicates complementary improvements in precision and overall stability. These findings confirm that the TriLex framework, together with AfroXLMR and AfriBERTa, provides a robust and scalable approach for sentiment lexicon development and multilingual sentiment analysis in low-resource South African languages.

large language model, machine learning, natural language, (19 more...)

2512.02799

Country: Africa (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Public Health (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

arXiv.org Artificial IntelligenceDec-3-2025

NeLLCom-Lex: A Neural-agent Framework to Study the Interplay between Lexical Systems and Language Use

Zhang, Yuqing, Ürker, Ecesu, Verhoef, Tessa, Boleda, Gemma, Bisazza, Arianna

Lexical semantic change has primarily been investigated with observational and experimental methods; however, observational methods (corpus analysis, distributional semantic modeling) cannot get at causal mechanisms, and experimental paradigms with humans are hard to apply to semantic change due to the extended diachronic processes involved. This work introduces NeLLCom-Lex, a neural-agent framework designed to simulate semantic change by first grounding agents in a real lexical system (e.g. English) and then systematically manipulating their communicative needs. Using a well-established color naming task, we simulate the evolution of a lexical system within a single generation, and study which factors lead agents to: (i) develop human-like naming behavior and lexicons, and (ii) change their behavior and lexicons according to their communicative needs. Our experiments with different supervised and reinforcement learning pipelines show that neural agents trained to 'speak' an existing language can reproduce human-like patterns in color naming to a remarkable extent, supporting the further use of NeLLCom-Lex to elucidate the mechanisms of semantic change.

artificial intelligence, machine learning, natural language, (18 more...)

doi: 10.18653/v1/2025.findings-emnlp.580

2509.22479

Country:

North America > United States (0.93)
Europe (0.68)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Chang, Emily, Bafna, Niyati

ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models

arXiv.org Artificial IntelligenceDec-1-2025

Existing benchmarks for large language models (LLMs) are largely restricted to high- or mid-resource languages, and often evaluate performance on higher-order tasks in reasoning and generation. However, plenty of evidence points to the fact that LLMs lack basic linguistic competence in the vast majority of the world's 3800+ written languages. We introduce ChiKhaPo, consisting of 8 subtasks of varying difficulty designed to evaluate the lexical comprehension and generation abilities of generative models. ChiKhaPo draws on existing lexicons, monolingual data, and bitext, and provides coverage for 2700+ languages for 2 subtasks, surpassing any existing benchmark in terms of language coverage. We further show that 6 SOTA models struggle on our benchmark, and discuss the factors contributing to performance scores, including language family, language resourcedness, task, and comprehension versus generation directions. With ChiKhaPo, we hope to enable and encourage the massively multilingual benchmarking of LLMs.

computational linguistic, large language model, machine learning, (18 more...)

2510.16928

Country:

North America > United States (1.00)
Europe (1.00)
Asia (0.67)

Genre: Research Report (0.63)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Machine LearningNov-27-2025

Zipf Distributions from Two-Stage Symbolic Processes: Stability Under Stochastic Lexical Filtering

Berman, Vladimir

Zipf's law in language lacks a definitive origin, debated across fields. This study explains Zipf-like behavior using geometric mechanisms without linguistic elements. The Full Combinatorial Word Model (FCWM) forms words from a finite alphabet, generating a geometric distribution of word lengths. Interacting exponential forces yield a power-law rank-frequency curve, determined by alphabet size and blank symbol probability. Simulations support predictions, matching English, Russian, and mixed-genre data. The symbolic model suggests Zipf-type laws arise from geometric constraints, not communicative efficiency.

corpora, power-law tail, stochastic lexical filter, (12 more...)

arXiv.org Machine Learning

2511.2106

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

arXiv.org Artificial IntelligenceNov-26-2025

Breaking Bad: Norms for Valence, Arousal, and Dominance for over 10k English Multiword Expressions

Mohammad, Saif M.

Factor analysis studies have shown that the primary dimensions of word meaning are Valence (V), Arousal (A), and Dominance (D). Existing lexicons such as the NRC VAD Lexicon, published in 2018, include VAD association ratings for words. Here, we present a complement to it, which has human ratings of valence, arousal, and dominance for 10k English Multiword Expressions (MWEs) and their constituent words. We also increase the coverage of unigrams, especially words that have become more common since 2018. In all, the new NRC VAD Lexicon v2 now has entries for 10k MWEs and 25k words, in addition to the entries in v1. We show that the associations are highly reliable. We use the lexicon to examine emotional characteristics of MWEs, including: 1. The degree to which MWEs (idioms, noun compounds, and verb particle constructions) exhibit strong emotionality; 2. The degree of emotional compositionality in MWEs. The lexicon enables a wide variety of research in NLP, Psychology, Public Health, Digital Humanities, and Social Sciences. The NRC VAD Lexicon v2 is freely available through the project webpage: http://saifmohammad.com/WebPages/nrc-vad.html

artificial intelligence, lexicon, natural language, (20 more...)

2511.19816

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.68)