mera
To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models
Hedström, Anna, Amoukou, Salim I., Bewley, Tom, Mishra, Saumitra, Veloso, Manuela
We introduce Mechanistic Error Reduction with Abstention (MERA), a principled framework for steering language models (LMs) to mitigate errors through selective, adaptive interventions. Unlike existing methods that rely on fixed, manually tuned steering strengths, often resulting in under or oversteering, MERA addresses these limitations by (i) optimising the intervention direction, and (ii) calibrating when, and how much to steer, thereby provably improving performance or abstaining when no confident correction is possible. Experiments across diverse datasets, and LM families demonstrate safe, effective, non-degrading error correction, and that MERA outperforms existing baselines. Moreover, MERA can be applied on top of existing steering techniques to further enhance their performance, establishing it as a general-purpose, and efficient approach to mechanistic activation steering.
- Europe > Austria > Vienna (0.14)
- North America > Canada > British Columbia > Vancouver (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- (6 more...)
From "Aha Moments" to Controllable Thinking: Toward Meta-Cognitive Reasoning in Large Reasoning Models via Decoupled Reasoning and Control
Ha, Rui, Li, Chaozhuo, Pu, Rui, Su, Sen
Large Reasoning Models (LRMs) have demonstrated a latent capacity for complex reasoning by spontaneously exhibiting cognitive behaviors such as step-by-step reasoning, reflection, and backtracking, commonly referred to as "Aha Moments". However, such emergent behaviors remain unregulated and uncontrolled, often resulting in overthinking, where the model continues generating redundant reasoning content even after reaching reliable conclusions. This leads to excessive computational costs and increased latency, limiting the practical deployment of LRMs. The root cause lies in the absence of intrinsic regulatory mechanisms, as current models are unable to monitor and adaptively manage their reasoning process to determine when to continue, backtrack, or terminate. To address this issue, we propose the Meta-cognitive Reasoning Framework (MERA), which explicitly decouples the thinking process into distinct reasoning and control components, thereby enabling the independent optimization of control strategies. Specifically, MERA incorporates a takeover-based data construction mechanism that identifies critical decision points during reasoning and delegates the creation of control signals to auxiliary LLMs, thereby enabling the construction of high-quality reasoning-control data. Additionally, a structured reasoning-control separation is implemented via supervised fine-tuning, enabling the model to generate explicit traces and acquire initial meta-cognitive control capabilities. Finally, MERA employs Control-Segment Policy Optimization (CSPO), which combines segment-wise Group Relative Policy Optimization (GRPO) with a control-masking mechanism to optimize control behavior learning while minimizing interference from irrelevant content. Experiments on various reasoning benchmarks demonstrate that models trained with MERA enhance both reasoning efficiency and accuracy.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs
Zhang, Dingkun, Qi, Shuhan, Xiao, Xinyu, Chen, Kehai, Wang, Xuan
Recent advances in Multimodal Large Language Models (MLLMs) have enhanced their versatility as they integrate a growing number of modalities. Considering the heavy cost of training MLLMs, it is necessary to reuse the existing ones and further extend them to more modalities through Modality-incremental Continual Learning (MCL). However, this often comes with a performance degradation in the previously learned modalities. In this work, we revisit the MCL and investigate a more severe issue it faces in contrast to traditional continual learning, that its degradation comes not only from catastrophic forgetting but also from the misalignment between the modality-agnostic and modality-specific components. To address this problem, we propose an elegantly simple MCL paradigm called "MErge then ReAlign" (MERA). Our method avoids introducing heavy training overhead or modifying the model architecture, hence is easy to deploy and highly reusable in the MLLM community. Extensive experiments demonstrate that, despite the simplicity of MERA, it shows impressive performance, holding up to a 99.84% Backward Relative Gain when extending to four modalities, achieving a nearly lossless MCL performance.
- Asia > China > Heilongjiang Province > Harbin (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
When Computing follows Vehicles: Decentralized Mobility-Aware Resource Allocation for Edge-to-Cloud Continuum
Nezami, Zeinab, Chaniotakis, Emmanouil, Pournaras, Evangelos
The transformation of smart mobility is unprecedented--Autonomous, shared and electric connected vehicles, along with the urgent need to meet ambitious net-zero targets by shifting to low-carbon transport modalities result in new traffic patterns and requirements for real-time computation at large-scale, for instance, augmented reality applications. The cloud computing paradigm can neither respond to such low-latency requirements nor adapt resource allocation to such dynamic spatio-temporal service requests. This paper addresses this grand challenge by introducing a novel decentralized optimization framework for mobility-aware edge-to-cloud resource allocation, service offloading, provisioning and load-balancing. In contrast to related work, this framework comes with superior efficiency and cost-effectiveness under evaluation in real-world traffic settings and mobility datasets. This breakthrough capability of 'computing follows vehicles' proves able to reduce utilization variance by more than 40 times, while preventing service deadline violations by 14%-34%.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- (5 more...)
- Transportation > Ground > Road (1.00)
- Telecommunications (1.00)
- Information Technology > Services (1.00)
- (3 more...)
MerA: Merging Pretrained Adapters For Few-Shot Learning
He, Shwai, Fan, Run-Ze, Ding, Liang, Shen, Li, Zhou, Tianyi, Tao, Dacheng
Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks. However, it often yields subpar results in few-shot learning. AdapterFusion, which assembles pretrained adapters using composition layers tailored to specific tasks, is a possible solution but significantly increases trainable parameters and deployment costs. Despite this, our preliminary study reveals that even single adapters can outperform Adapterfusion in few-shot learning, urging us to propose \textbf{\texttt{Merging Pretrained Adapters}} (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion. Extensive experiments on two PLMs demonstrate that MerA achieves substantial improvements compared to both single adapters and AdapterFusion. To further enhance the capacity of MerA, we also introduce a simple yet effective technique, referred to as the "\textit{same-track}" setting, that merges adapters from the same track of pretraining tasks. With the implementation of the "\textit{same-track}" setting, we observe even more impressive gains, surpassing the performance of both full fine-tuning and adapter tuning by a substantial margin, e.g., 3.5\% in MRPC and 5.0\% in MNLI.
Classical versus Quantum: comparing Tensor Network-based Quantum Circuits on LHC data
Araz, Jack Y., Spannowsky, Michael
Tensor Networks (TN) are approximations of high-dimensional tensors designed to represent locally entangled quantum many-body systems efficiently. This study provides a comprehensive comparison between classical TNs and TN-inspired quantum circuits in the context of Machine Learning on highly complex, simulated LHC data. We show that classical TNs require exponentially large bond dimensions and higher Hilbert-space mapping to perform comparably to their quantum counterparts. While such an expansion in the dimensionality allows better performance, we observe that, with increased dimensionality, classical TNs lead to a highly flat loss landscape, rendering the usage of gradient-based optimization methods highly challenging. Furthermore, by employing quantitative metrics, such as the Fisher information and effective dimensions, we show that classical TNs require a more extensive training sample to represent the data as efficiently as TN-inspired quantum circuits. We also engage with the idea of hybrid classical-quantum TNs and show possible architectures to employ a larger phase-space from the data. We offer our results using three main TN ansatz: Tree Tensor Networks, Matrix Product States, and Multi-scale Entanglement Renormalisation Ansatz.
- South America > Ecuador > Pichincha Province > Quito (0.04)
- South America > Colombia > Bogotá D.C. > Bogotá (0.04)
- Europe > United Kingdom (0.04)
- (2 more...)
A Multi-Scale Tensor Network Architecture for Classification and Regression
Reyes, Justin, Stoudenmire, Miles
A Multi-Scale T ensor Network Architecture for Classification and Regression Justin Reyes 1 and E. Miles Stoudenmire 2 1 Department of Physics, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL 32816, USA 2 Center for Computational Quantum Physics, Flatiron Institute, 162 5th Avenue, New Y ork, NY 10010, USA (Dated: January 24, 2020) We present an algorithm for supervised learning using tensor networks, employing a step of preprocessing the data by coarse-graining through a sequence of wavelet transformations. We represent these transformations as a set of tensor network layers identical to those in a multi-scale entanglement renormalization ansatz (MERA) tensor network, and perform supervised learning and regression tasks through a model based on a matrix product state (MPS) tensor network acting on the coarse-grained data. Because the entire model consists of tensor contractions (apart from the initial nonlinear feature map), we can adaptively fine-grain the optimized MPS model backwards through the layers with essentially no loss in performance. The MPS itself is trained using an adaptive algorithm based on the density matrix renormalization group (DMRG) algorithm. We test our methods by performing a classification task on audio data and a regression task on temperature time-series data, studying the dependence of training accuracy on the number of coarse-graining layers and showing how fine-graining through the network may be used to initialize models with access to finer-scale features. I. INTRODUCTION Computational techniques developed across the machine learning and physics fields have consistently generated promising methods and applications in both areas of study. The application of well established machine learning architectures and optimization techniques has enriched the physics community with advances such as modeling and recognizing topological quantum states [1-3], optimizing quantum error correction codes [4], or classifying quantum walks [5]. Conversely, techniques known as tensor networks which model high-dimensional functions and are closely connected to physical principles have begun to be explored more in applied mathematics and machine learning [6-16].
Number-State Preserving Tensor Networks as Classifiers for Supervised Learning
We propose a restricted class of tensor network state, built from number-state preserving tensors, for supervised learning tasks. This class of tensor network is argued to be a natural choice for classifiers as (i) they map classical data to classical data, and thus preserve the interpretability of data under tensor transformations, (ii) they can be efficiently trained to maximize their scalar product against classical data sets, and (iii) they seem to be as powerful as generic (unrestricted) tensor networks in this task. Our proposal is demonstrated using a variety of benchmark classification problems, where number-state preserving versions of commonly used networks (including MPS, TTN and MERA) are trained as effective classifiers. This work opens the path for powerful tensor network methods such as MERA, which were previously computationally intractable as classifiers, to be employed for difficult tasks such as image recognition.
- North America > United States > New York (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
Making Pepper walk: Understanding Softbank's purchase of Boston Dynamics
It is unclear if Masayoshi Son, Chairman of Softbank, was one of the 17 million YouTube viewers of Boston Dynamic's Big Dog before acquiring the company for an undisclosed amount this past Thursday. What is clear is the acquisition of Boston Dynamics by Softbank is a big deal. Softbank's humanoid robot Pepper is trading up her dainty wheels for a pair of sturdy legs. In expressing his excitement for the acquisition, Masayoshi Son said, "Today, there are many issues we still cannot solve by ourselves with human capabilities. Smart robotics are going to be a key driver of the next stage of the Information Revolution, and Marc and his team at Boston Dynamics are the clear technology leaders in advanced dynamic robots. I am thrilled to welcome them to the SoftBank family and look forward to supporting them as they continue to advance the field of robotics and explore applications that can help make life easier, safer and more fulfilling."
- Asia > Japan (0.06)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
Making Pepper walk: Understanding Softbank's purchase of Boston Dynamics
It is unclear if Masayoshi Son, Chairman of Softbank, was one of the 17 million YouTube viewers of Boston Dynamic's Big Dog before acquiring the company for an undisclosed amount this past Thursday. What is clear is the acquisition of Boston Dynamics by Softbank is a big deal. Softbank's humanoid robot Pepper is trading up her dainty wheels for a pair of sturdy legs. In expressing his excitement for the acquisition, Masayoshi Son said, "Today, there are many issues we still cannot solve by ourselves with human capabilities. Smart robotics are going to be a key driver of the next stage of the Information Revolution, and Marc and his team at Boston Dynamics are the clear technology leaders in advanced dynamic robots. I am thrilled to welcome them to the SoftBank family and look forward to supporting them as they continue to advance the field of robotics and explore applications that can help make life easier, safer and more fulfilling."
- Asia > Japan (0.06)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)