AITopics | maj

Collaborating Authors

maj

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ModelInterpretabilitythroughtheLensof ComputationalComplexity

Neural Information Processing SystemsFeb-9-2026, 21:17:04 GMT

In this paper we use explainability queries to formally compare the interpretability of machinelearningmodels.

artificial intelligence, interpretability, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
South America > Chile (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > France (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

Don't Learn, Ground: A Case for Natural Language Inference with Visual Grounding

Ignatev, Daniil, Santeer, Ayman, Gatt, Albert, Paperno, Denis

arXiv.org Artificial IntelligenceNov-24-2025

We propose a zero-shot method for Natural Language Inference (NLI) that leverages multimodal representations by grounding language in visual contexts. Our approach generates visual representations of premises using text-to-image models and performs inference by comparing these representations with textual hypotheses. We evaluate two inference techniques: cosine similarity and visual question answering. Our method achieves high accuracy without task-specific fine-tuning, demonstrating robustness against textual biases and surface heuristics. Additionally, we design a controlled adversarial dataset to validate the robustness of our approach. Our findings suggest that leveraging visual modality as a meaning representation provides a promising direction for robust natural language understanding.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.17358

Country:

Europe (0.68)
North America > United States > New Mexico (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.31)

Add feedback

Online Mixture of Experts: No-Regret Learning for Optimal Collective Decision-Making

Liu, Larkin, Etesami, Jalal

arXiv.org Artificial IntelligenceNov-18-2025

We explore the use of expert-guided bandit learning, which we refer to as online mixture-of-experts (OMoE). In this setting, given a context, a candidate committee of experts must determine how to aggregate their outputs to achieve optimal results in terms of aggregate accuracy. We propose two algorithms to address this problem. The first algorithm combines aggregate voting with UCB-driven successive elimination, efficiently pruning suboptimal exploration actions. The second algorithm employs an online weighted-majority-voting mechanism, leveraging the respective voting power of each expert proportional to their predictive power. We derive theoretical guarantees for the regret properties in the bandit setting under ideal circumstances, and empirical results are provided accordingly. As a modern study on applications, these methods are applied to the online fine-tuning of a set of expert large language models (LLMs), where after each response, the generative LLM dynamically reweighs its set of experts and/or selects the optimal committee of experts to generate the most accurate response. Our results introduce new methodologies and no-regret guarantees for combining multiple experts to improve on the performance of the an aggregate model overall.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.21788

Country: North America > United States > Minnesota (0.27)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.87)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models

Liu, Chenxi, Liang, Junjie, Jia, Yuqi, Cao, Bochuan, Bai, Yang, Huang, Heng, Chen, Xun

arXiv.org Artificial IntelligenceNov-10-2025

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an effective approach for improving the reasoning abilities of large language models (LLMs). The Group Relative Policy Optimization (GRPO) family has demonstrated strong performance in training LLMs with RLVR. However, as models train longer and scale larger, more training prompts become residual prompts, those with zero variance rewards that provide no training signal. Consequently, fewer prompts contribute to training, reducing diversity and hindering effectiveness. To fully exploit these residual prompts, we propose the Explore Residual Prompts in Policy Optimization (ERPO) framework, which encourages exploration on residual prompts and reactivates their training signals. ERPO maintains a history tracker for each prompt and adaptively increases the sampling temperature for residual prompts that previously produced all correct responses. This encourages the model to generate more diverse reasoning traces, introducing incorrect responses that revive training signals. Empirical results on the Qwen2.5 series demonstrate that ERPO consistently surpasses strong baselines across multiple mathematical reasoning benchmarks.

large language model, machine learning, residual prompt, (19 more...)

arXiv.org Artificial Intelligence

2511.048

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Explaining Decisions in ML Models: a Parameterized Complexity Analysis (Part I)

Ordyniak, Sebastian, Paesani, Giacomo, Rychlicki, Mateusz, Szeider, Stefan

arXiv.org Artificial IntelligenceNov-6-2025

This paper presents a comprehensive theoretical investigation into the parameterized complexity of explanation problems in various machine learning (ML) models. Contrary to the prevalent black-box perception, our study focuses on models with transparent internal mechanisms. We address two principal types of explanation problems: abductive and contrastive, both in their local and global variants. Our analysis encompasses diverse ML models, including Decision Trees, Decision Sets, Decision Lists, Boolean Circuits, and ensembles thereof, each offering unique explanatory challenges. This research fills a significant gap in explainable AI (XAI) by providing a foundational understanding of the complexities of generating explanations for these models. This work provides insights vital for further research in the domain of XAI, contributing to the broader discourse on the necessity of transparency and accountability in AI systems.

explanation, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.03545

Country: Europe > Austria (0.27)

Genre:

Overview (0.92)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

MASPRM: Multi-Agent System Process Reward Model

Yazdani, Milad, Mostajabdaveh, Mahdi, Zhou, Zirui, Xiong, Ying

arXiv.org Artificial IntelligenceOct-30-2025

Practical deployment of Multi-Agent Systems (MAS) demands strong test-time performance, motivating methods that guide inference-time search and selectively spend compute to improve quality. We present the Multi-Agent System Process Reward Model (MASPRM). It assigns per-action, per-agent values to partial inter-agent transcripts and acts as an inference-time controller. MASPRM is trained from multi-agent Monte Carlo Tree Search (MCTS) rollouts without requiring step-level human annotations, by propagating returns to local targets. At inference, MASPRM guides step-level beam search and MCTS, focusing computation on promising branches and pruning early. On GSM8K and MATH, MASPRM-guided decoding with an outcome reward model (ORM) applied to the final answer, improves exact match (EM) over a single straight-through MAS pass by $+30.7$ and $+22.9$ points, respectively. A MASPRM trained on GSM8K transfers zero-shot to MATH without retraining, adding $8.4$ EM points at the same budget. MASPRM is a plug-in value model that estimates per-agent progress and complements verifier-style decoders, enabling more reliable, compute-aware multi-agent reasoning. Code: https://github.com/milad1378yz/MASPRM

artificial intelligence, arxiv preprint arxiv, masprm, (14 more...)

arXiv.org Artificial Intelligence

2510.24803

Country: North America > Canada > British Columbia (0.46)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Sample Smart, Not Hard: Correctness-First Decoding for Better Reasoning in LLMs

Li, Xueyan, Su, Guinan, Sachan, Mrinmaya, Geiping, Jonas

arXiv.org Artificial IntelligenceOct-8-2025

Large Language Models (LLMs) are increasingly applied to complex tasks that require extended reasoning. In such settings, models often benefit from diverse chains-of-thought to arrive at multiple candidate solutions. This requires two competing objectives: to inject enough stochasticity to explore multiple reasoning chains, and to ensure sufficient accuracy and quality in each path. Existing works pursue the first objective by increasing exploration at highly uncertain steps with higher temperature or larger candidate token sets, while others improve reliability by rejecting samples with low confidence post-generation, implying that low confidence correlates with low answer quality. These two lines of thought are in conflict, as they conflate different sources of uncertainty. To resolve this, we argue that the decoding rule should be calibrated by correctness, not confidence alone. We should sample from tokens with higher estimated correctness, and reduce sampling where expected correctness is low. We propose simple strategies that achieve this goal: Greedy-Threshold makes sampling greedy at very low confidence steps. Calibrated-T opK and Calibrated-ε set truncation threshold based on estimated rank-wise correctness. Large Language Models (LLMs) are used for a wide range of generation tasks, ranging from open-ended text to structured problem-solving. In many cases, producing more than one candidate output improves not only fluency, but also reliability, since different samples may capture alternative valid continuations (Wang et al., 2023; Lin et al., 2024). This practice highlights a fundamental trade-off: introducing enough randomness to explore multiple options while still ensuring the accuracy and quality of each individual output (Tan et al., 2024; Meister et al., 2024; Shi et al., 2024). Existing works optimize exploration by raising temperatures or enlarging candidate token sets step-by-step (Nguyen et al., 2025; Zhang et al., 2024; Hewitt et al., 2022). These methods assume that higher entropy is a signal of uncertainty between multiple valid next steps, warranting broader exploration.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.05987

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning

Zhang, Tunyu, Shi, Haizhou, Wang, Yibin, Wang, Hengyi, He, Xiaoxiao, Li, Zhuowei, Chen, Haoxian, Han, Ligong, Xu, Kai, Zhang, Huan, Metaxas, Dimitris, Wang, Hao

arXiv.org Artificial IntelligenceSep-29-2025

While Large Language Models (LLMs) have demonstrated impressive capabilities, their output quality remains inconsistent across various application scenarios, making it difficult to identify trustworthy responses, especially in complex tasks requiring multi-step reasoning. In this paper, we propose a Token-level Uncertainty estimation framework for Reasoning (TokUR) that enables LLMs to self-assess and self-improve their responses in mathematical reasoning. Specifically, we introduce low-rank random weight perturbation during LLM decoding to generate predictive distributions for token-level uncertainty estimation, and we aggregate these uncertainty quantities to capture the semantic uncertainty of generated responses. Experiments on mathematical reasoning datasets of varying difficulty demonstrate that TokUR exhibits a strong correlation with answer correctness and model robustness, and the uncertainty signals produced by TokUR can be leveraged to enhance the model's reasoning performance at test time. These results highlight the effectiveness of TokUR as a principled and scalable approach for improving the reliability and interpretability of LLMs in challenging reasoning tasks.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.11737

Country:

Europe (0.28)
Asia (0.28)

Genre: Research Report (1.00)

Add feedback

Single-stream Policy Optimization

Xu, Zhongwen, Ding, Zihan

arXiv.org Machine LearningSep-24-2025

We revisit policy-gradient optimization for Large Language Models (LLMs) from a single-stream perspective. Prevailing group-based methods like GRPO reduce variance with on-the-fly baselines but suffer from critical flaws: frequent degenerate groups erase learning signals, and synchronization barriers hinder scalability. We introduce Single-stream Policy Optimization (SPO), which eliminates these issues by design. SPO replaces per-group baselines with a persistent, KL-adaptive value tracker and normalizes advantages globally across the batch, providing a stable, low-variance learning signal for every sample. Being group-free, SPO enables higher throughput and scales effectively in long-horizon or tool-integrated settings where generation times vary. Furthermore, the persistent value tracker naturally enables an adaptive curriculum via prioritized sampling. Experiments using Qwen3-8B show that SPO converges more smoothly and attains higher accuracy than GRPO, while eliminating computation wasted on degenerate groups. Ablation studies confirm that SPO's gains stem from its principled approach to baseline estimation and advantage normalization, offering a more robust and efficient path for LLM reasoning. Across five hard math benchmarks with Qwen3 8B, SPO improves the average maj@32 by +3.4 percentage points (pp) over GRPO, driven by substantial absolute point gains on challenging datasets, including +7.3 pp on BRUMO 25, +4.4 pp on AIME 25, +3.3 pp on HMMT 25, and achieves consistent relative gain in pass@$k$ across the evaluated $k$ values. SPO's success challenges the prevailing trend of adding incidental complexity to RL algorithms, highlighting a path where fundamental principles, not architectural workarounds, drive the next wave of progress in LLM reasoning.

arxivpreprintarxiv, grpo, spö, (12 more...)

arXiv.org Machine Learning

2509.13232

Country: Asia > China > Jiangsu Province > Yancheng (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges

Muzsai, Lajos, Imolai, David, Lukács, András

arXiv.org Artificial IntelligenceAug-19-2025

We present 'Random-Crypto', a procedurally generated cryptographic Capture The Flag (CTF) dataset designed to unlock the potential of Reinforcement Learning (RL) for LLM-based agents in security-sensitive domains. Cryptographic reasoning offers an ideal RL testbed: it combines precise validation, structured multi-step inference, and reliance on reliable computational tool use. Leveraging these properties, we fine-tune a Python tool-augmented Llama-3.1-8B via Group Relative Policy Optimization (GRPO) in a secure execution environment. The resulting agent achieves a significant improvement in Pass@8 on previously unseen challenges. Moreover, the improvements generalize to two external benchmarks: 'picoCTF', spanning both crypto and non-crypto tasks, and 'AICrypto MCQ', a multiple-choice benchmark of 135 cryptography questions. Ablation studies attribute the gains to enhanced tool usage and procedural reasoning. These findings position 'Random-Crypto' as a rich training ground for building intelligent, adaptable LLM agents capable of handling complex cybersecurity tasks.

large language model, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2506.02048

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback