AITopics

Country:

Asia > China > Tianjin Province > Tianjin (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Maryland > Baltimore (0.04)
(9 more...)

Genre: Research Report (0.46)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.46)

Neural Information Processing SystemsAug-20-2025, 01:19:25 GMT

model on any particular supervised task). We compared with GPT-2 (345M) on the Winograd Schema Challenge

Interesting to see how well the proposed model would do under such zero-shot setup (i.e. GPT -2 accuracy is taken from their paper. The BERT paper reports that BooksCorpus and Wikipedia contain 0.8B and 2.5B words, respectively. For our processed data, BooksCorpus and Wikipedia contain 0.75B and 2B words, respectively. The implementation is the same as word embedding, i.e., a lookup "Segment 1", and "Segment 2") and feed it to model input, which indicates the segment of input tokens.

large language model, natural language, particular supervised task, (17 more...)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.90)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.61)

Neural Information Processing SystemsAug-17-2025, 20:43:14 GMT

A Additional prompt data details

Desination will be a red barn on the right 1. Continued on next page 18 Use Case Example rewrite Rewrite the following text to be more light-hearted: -- {very formal text} -- chat The following is a conversation with an AI assistant.

completion, large language model, machine learning, (23 more...)

Country:

Europe > Greece (0.04)
Asia > Southeast Asia (0.04)
Oceania > New Zealand (0.04)
North America > United States > District of Columbia > Washington (0.04)

Genre:

Questionnaire & Opinion Survey (0.93)
Personal > Obituary (0.45)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Banking & Finance > Economy (1.00)
(3 more...)

Technology:

Information Technology > Communications (1.00)
Information Technology > Human Computer Interaction (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)
(3 more...)

Neural Information Processing SystemsAug-17-2025, 12:33:15 GMT

Ron Yosef Y uval Elovici

In this work, we introduce WinoGA ViL: an online game of vision-and-language associations (e.g., between werewolves and a full moon), used as a dynamic evaluation benchmark.

computational linguistic, machine learning, natural language, (17 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(15 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.88)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(3 more...)

Neural Information Processing SystemsAug-16-2025, 05:37:45 GMT

KG: Learning From Knowledge Graph Explanations for Commonsense Reasoning Aaron Chan

First, we propose to create coarse ( Is the KG useful?) and fine ( Which

explanation, machine learning, natural language, (19 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California (0.14)
Asia > China > Hong Kong (0.04)
(3 more...)

Genre: Research Report (0.46)

Industry:

Government > Military (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.51)

arXiv.org Artificial IntelligenceAug-6-2025

Coherent Multimodal Reasoning with Iterative Self-Evaluation for Vision-Language Models

Luo, Wenjie, Li, Ruocheng, Zhu, Shanshan, Perry, Julian

--Despite significant advancements, current large language models (LLMs) and vision-language models (L VLMs) continue to struggle with complex, multi-step, cross-modal common sense reasoning tasks, often exhibiting a lack of "deliberative thinking." They tend to rely on superficial associations rather than deep, chained inference, particularly when integrating visual information with abstract concepts. T o address this, we propose the Coherent Multimodal Reasoning Framework (CMRF), a novel approach that enhances L VLMs' common sense reasoning capabilities through an iterative, self-evaluating inference mechanism. CMRF mimics human problem-solving by decomposing complex queries, generating step-by-step inferences, and self-correcting errors. Coupled with an Adaptive Iterative Refinement strategy, CMRF systematically refines its reasoning paths. Built upon LLaV A-1.6-34B and trained on a novel Multimodal Daily Activity Reasoning (MDAR) dataset, CMRF achieves state-of-the-art performance among open-source L VLMs on challenging benchmarks like VCR, A-OKVQA, and DailyLife-MRC. Extensive ablation studies and human evaluations confirm the critical contributions of each module and the effectiveness of iterative refinement in fostering more coherent and accurate reasoning. The remarkable advancements in large language models (LLMs) [1], [2] and vision-language models (L VLMs) have revolutionized various aspects of artificial intelligence, demonstrating unprecedented capabilities in understanding, generating, and processing information across modalities [3]. These models excel in tasks ranging from complex question answering to creative content generation, largely due to their extensive pre-training on vast amounts of data.

cmrf, large language model, natural language, (16 more...)

2508.02886

Country:

North America > United States (0.93)
Asia > Middle East > UAE (0.28)

Genre:

Research Report (1.00)
Overview > Innovation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

arXiv.org Artificial IntelligenceAug-4-2025

AutoMixer: Checkpoint Artifacts as Automatic Data Mixers

Chang, Ernie, Li, Yang, Huber, Patrick, Vogeti, Vish, Kant, David, Shi, Yangyang, Chandra, Vikas

In language model training, it is desirable to equip models with capabilities from various tasks. However, it is not clear how to directly obtain the right data mixtures for these capabilities as the relationship between data and tasks is difficult to be modeled. In this work, we observe that checkpoint models exhibit emerging capabilities at different points in the training trajectory. Often, the training process saves checkpoints as artifacts that are under-utilized as a source of in-training data signals. We identify these artifact models based on their respective capabilities on the benchmarks and leverage them as data mixers by using their aggregated first-order influence approximation over source data. We demonstrated on eight reasoning benchmarks that the proposed framework shows significant improvements in the pretraining setting, with performance improvements of up to 1.93%. Overall, this shows the potential of checkpoint models to enhance data quality and optimize data mixtures.

large language model, machine learning, natural language, (17 more...)

2506.2191

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Albert, Paul, Zhang, Frederic Z., Saratchandran, Hemanth, Hengel, Anton van den, Abbasnejad, Ehsan

Towards Higher Effective Rank in Parameter-efficient Fine-tuning using Khatri--Rao Product

arXiv.org Artificial IntelligenceAug-4-2025

Parameter-efficient fine-tuning (PEFT) has become a standard approach for adapting large pre-trained models. Amongst PEFT methods, low-rank adaptation (LoRA) has achieved notable success. However, recent studies have highlighted its limitations compared against full-rank alternatives, particularly when applied to multimodal and large language models. In this work, we present a quantitative comparison amongst full-rank and low-rank PEFT methods using a synthetic matrix approximation benchmark with controlled spectral properties. Our results confirm that LoRA struggles to approximate matrices with relatively flat spectrums or high frequency components -- signs of high effective ranks. To this end, we introduce KRAdapter, a novel PEFT algorithm that leverages the Khatri-Rao product to produce weight updates, which, by construction, tends to produce matrix product with a high effective rank. We demonstrate performance gains with KRAdapter on vision-language models up to 1B parameters and on large language models up to 8B parameters, particularly on unseen common-sense reasoning tasks. In addition, KRAdapter maintains the memory and compute efficiency of LoRA, making it a practical and robust alternative to fine-tune billion-scale parameter models.

large language model, machine learning, natural language, (22 more...)

2508.0023

Country: North America (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Merlo, Elena, Lagomarsino, Marta, Ajoudani, Arash

A Human-in-the-loop Approach to Robot Action Replanning through LLM Common-Sense Reasoning

arXiv.org Artificial IntelligenceJul-29-2025

To facilitate the wider adoption of robotics, accessible programming tools are required for non-experts. Observational learning enables intuitive human skills transfer through hands-on demonstrations, but relying solely on visual input can be inefficient in terms of scalability and failure mitigation, especially when based on a single demonstration. This paper presents a human-in-the-loop method for enhancing the robot execution plan, automatically generated based on a single RGB video, with natural language input to a Large Language Model (LLM). By including user-specified goals or critical task aspects and exploiting the LLM common-sense reasoning, the system adjusts the vision-based plan to prevent potential failures and adapts it based on the received instructions. Experiments demonstrated the framework intuitiveness and effectiveness in correcting vision-derived errors and adapting plans without requiring additional demonstrations. Moreover, interactive plan refinement and hallucination corrections promoted system robustness.

demonstration, large language model, natural language, (20 more...)

2507.2087

Country: Europe (0.46)

Genre:

Workflow (0.89)
Research Report (0.84)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Jallad, Khloud AL, Ghneim, Nada, Rebdawi, Ghaida

Survey of NLU Benchmarks Diagnosing Linguistic Phenomena: Why not Standardize Diagnostics Benchmarks?

arXiv.org Artificial IntelligenceJul-29-2025

Natural Language Understanding (NLU) is a basic task in Natural Language Processing (NLP). The evaluation of NLU capabilities has become a trending research topic that attracts researchers in the last few years, resulting in the development of numerous benchmarks. These benchmarks include various tasks and datasets in order to evaluate the results of pretrained models via public leaderboards. Notably, several benchmarks contain diagnostics datasets designed for investigation and fine-grained error analysis across a wide range of linguistic phenomena. This survey provides a comprehensive review of available English, Arabic, and Multilingual NLU benchmarks, with a particular emphasis on their diagnostics datasets and the linguistic phenomena they covered. We present a detailed comparison and analysis of these benchmarks, highlighting their strengths and limitations in evaluating NLU tasks and providing in-depth error analysis. When highlighting the gaps in the state-of-the-art, we noted that there is no naming convention for macro and micro categories or even a standard set of linguistic phenomena that should be covered. Consequently, we formulated a research question regarding the evaluation metrics of the evaluation diagnostics benchmarks: "Why do not we have an evaluation standard for the NLU evaluation diagnostics benchmarks?" similar to ISO standard in industry. We conducted a deep analysis and comparisons of the covered linguistic phenomena in order to support experts in building a global hierarchy for linguistic phenomena in future. We think that having evaluation metrics for diagnostics evaluation could be valuable to gain more insights when comparing the results of the studied models on different diagnostics benchmarks.

computational linguistic, large language model, natural language, (19 more...)

2507.20419

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Commonsense Reasoning (0.68)