Goto

Collaborating Authors

 mur


MUR: Momentum Uncertainty guided Reasoning for Large Language Models

Yan, Hang, Xu, Fangzhi, Xu, Rongman, Li, Yifei, Zhang, Jian, Luo, Haoran, Wu, Xiaobao, Tuan, Luu Anh, Zhao, Haiteng, Lin, Qika, Liu, Jun

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have achieved impressive performance on reasoning-intensive tasks, yet optimizing their reasoning efficiency remains an open challenge. While Test-Time Scaling (TTS) improves reasoning quality, it often leads to overthinking, wasting tokens on redundant computations. This work investigates how to efficiently and adaptively guide LLM test-time scaling without additional training. Inspired by the concept of momentum in physics, we propose Momentum Uncertainty-guided Reasoning (MUR), which dynamically allocates thinking budgets to critical reasoning steps by tracking and aggregating stepwise uncertainty over time. To support flexible inference-time control, we introduce gamma-control, a simple mechanism that tunes the reasoning budget via a single hyperparameter. We provide in-depth theoretical proof to support the superiority of MUR in terms of stability and biases. MUR is comprehensively evaluated against various TTS methods across four challenging benchmarks (MATH-500, AIME24, AIME25, and GPQA-diamond) using different sizes of recent Qwen3 models (1.7B, 4B, and 8B). Results demonstrate that MUR reduces computation by over 50% on average while improving accuracy by 0.62-3.37%.


Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning

Crosbie, J., Shutova, E.

arXiv.org Artificial Intelligence

As Large language models have shown a remarkable a significant milestone in this area, Elhage et al. ability to learn and perform complex tasks through (2021) demonstrated the existence of induction in-context learning (ICL) (Brown et al., 2020; Touvron heads in Transformer LMs. These heads scan the et al., 2023b). In ICL, the model receives context for previous instances of the current token a demonstration context and a query question as using a prefix matching mechanism, which identifies a prompt for prediction. Unlike supervised learning, if and where a token has appeared before. ICL utilises the pretrained model's capabilities If a matching token is found, the head employs to recognise and replicate patterns within the a copying mechanism to increase the probability demonstration context, thereby enabling accurate of the subsequent token, facilitating exact or approximate predictions for the query without the use of gradient repetition of sequences and embodying updates.


New study identifies how AI fails to reproduce human vision

#artificialintelligence

When a human spots a familiar face or an oncoming vehicle, it takes the brain a mere 100 milliseconds (about one-tenth of a second) to identify it and more importantly, place it in the right context so it can be understood, and the individual can react accordingly. Unsurprisingly, computers may be able to do this faster, but are they as accurate as humans in the real world? Not always, and that's a problem, according to a study led by Western neuroimaging expert Marieke Mur. Computers can be taught to process incoming data, like observing faces and cars, using artificial intelligence known as deep neural networks or deep learning. This type of machine learning process uses interconnected nodes or neurons in a layered structure that resembles the human brain.


Semi-Supervised Learning with Variational Bayesian Inference and Maximum Uncertainty Regularization

Do, Kien, Tran, Truyen, Venkatesh, Svetha

arXiv.org Artificial Intelligence

We propose two generic methods for improving semi-supervised learning (SSL). The first integrates weight perturbation (WP) into existing "consistency regularization" (CR) based methods. We implement WP by leveraging variational Bayesian inference (VBI). The second method proposes a novel consistency loss called "maximum uncertainty regularization" (MUR). While most consistency losses act on perturbations in the vicinity of each data point, MUR actively searches for "virtual" points situated beyond this region that cause the most uncertain class predictions. This allows MUR to impose smoothness on a wider area in the input-output manifold. Our experiments show clear improvements in classification errors of various CR based methods when they are combined with VBI or MUR or both.


A Unified Framework for Sparse Non-Negative Least Squares using Multiplicative Updates and the Non-Negative Matrix Factorization Problem

Fedorov, Igor, Nalci, Alican, Giri, Ritwik, Rao, Bhaskar D., Nguyen, Truong Q., Garudadri, Harinath

arXiv.org Machine Learning

We study the sparse non-negative least squares (S-NNLS) problem. S-NNLS occurs naturally in a wide variety of applications where an unknown, non-negative quantity must be recovered from linear measurements. We present a unified framework for S-NNLS based on a rectified power exponential scale mixture prior on the sparse codes. We show that the proposed framework encompasses a large class of S-NNLS algorithms and provide a computationally efficient inference procedure based on multiplicative update rules. Such update rules are convenient for solving large sets of S-NNLS problems simultaneously, which is required in contexts like sparse non-negative matrix factorization (S-NMF). We provide theoretical justification for the proposed approach by showing that the local minima of the objective function being optimized are sparse and the S-NNLS algorithms presented are guaranteed to converge to a set of stationary points of the objective function. We then extend our framework to S-NMF, showing that our framework leads to many well known S-NMF algorithms under specific choices of prior and providing a guarantee that a popular subclass of the proposed algorithms converges to a set of stationary points of the objective function. Finally, we study the performance of the proposed approaches on synthetic and real-world data.