AITopics | mera

Collaborating Authors

mera

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

To Steer or Not to Steer? Mechanistic Error Reduction with Abstention for Language Models

Hedström, Anna, Amoukou, Salim I., Bewley, Tom, Mishra, Saumitra, Veloso, Manuela

arXiv.org Artificial IntelligenceOct-16-2025

We introduce Mechanistic Error Reduction with Abstention (MERA), a principled framework for steering language models (LMs) to mitigate errors through selective, adaptive interventions. Unlike existing methods that rely on fixed, manually tuned steering strengths, often resulting in under or oversteering, MERA addresses these limitations by (i) optimising the intervention direction, and (ii) calibrating when, and how much to steer, thereby provably improving performance or abstaining when no confident correction is possible. Experiments across diverse datasets, and LM families demonstrate safe, effective, non-degrading error correction, and that MERA outperforms existing baselines. Moreover, MERA can be applied on top of existing steering techniques to further enhance their performance, establishing it as a general-purpose, and efficient approach to mechanistic activation steering.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.1329

Country:

North America > United States (0.67)
Asia (0.67)

Genre: Research Report > New Finding (0.93)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

From "Aha Moments" to Controllable Thinking: Toward Meta-Cognitive Reasoning in Large Reasoning Models via Decoupled Reasoning and Control

Ha, Rui, Li, Chaozhuo, Pu, Rui, Su, Sen

arXiv.org Artificial IntelligenceAug-7-2025

Large Reasoning Models (LRMs) have demonstrated a latent capacity for complex reasoning by spontaneously exhibiting cognitive behaviors such as step-by-step reasoning, reflection, and backtracking, commonly referred to as "Aha Moments". However, such emergent behaviors remain unregulated and uncontrolled, often resulting in overthinking, where the model continues generating redundant reasoning content even after reaching reliable conclusions. This leads to excessive computational costs and increased latency, limiting the practical deployment of LRMs. The root cause lies in the absence of intrinsic regulatory mechanisms, as current models are unable to monitor and adaptively manage their reasoning process to determine when to continue, backtrack, or terminate. To address this issue, we propose the Meta-cognitive Reasoning Framework (MERA), which explicitly decouples the thinking process into distinct reasoning and control components, thereby enabling the independent optimization of control strategies. Specifically, MERA incorporates a takeover-based data construction mechanism that identifies critical decision points during reasoning and delegates the creation of control signals to auxiliary LLMs, thereby enabling the construction of high-quality reasoning-control data. Additionally, a structured reasoning-control separation is implemented via supervised fine-tuning, enabling the model to generate explicit traces and acquire initial meta-cognitive control capabilities. Finally, MERA employs Control-Segment Policy Optimization (CSPO), which combines segment-wise Group Relative Policy Optimization (GRPO) with a control-masking mechanism to optimize control behavior learning while minimizing interference from irrelevant content. Experiments on various reasoning benchmarks demonstrate that models trained with MERA enhance both reasoning efficiency and accuracy.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2508.0446

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs

Zhang, Dingkun, Qi, Shuhan, Xiao, Xinyu, Chen, Kehai, Wang, Xuan

arXiv.org Artificial IntelligenceMar-8-2025

Recent advances in Multimodal Large Language Models (MLLMs) have enhanced their versatility as they integrate a growing number of modalities. Considering the heavy cost of training MLLMs, it is necessary to reuse the existing ones and further extend them to more modalities through Modality-incremental Continual Learning (MCL). However, this often comes with a performance degradation in the previously learned modalities. In this work, we revisit the MCL and investigate a more severe issue it faces in contrast to traditional continual learning, that its degradation comes not only from catastrophic forgetting but also from the misalignment between the modality-agnostic and modality-specific components. To address this problem, we propose an elegantly simple MCL paradigm called "MErge then ReAlign" (MERA). Our method avoids introducing heavy training overhead or modifying the model architecture, hence is easy to deploy and highly reusable in the MLLM community. Extensive experiments demonstrate that, despite the simplicity of MERA, it shows impressive performance, holding up to a 99.84% Backward Relative Gain when extending to four modalities, achieving a nearly lossless MCL performance.

mera, modality, relative gain, (15 more...)

arXiv.org Artificial Intelligence

2503.07663

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

When Computing follows Vehicles: Decentralized Mobility-Aware Resource Allocation for Edge-to-Cloud Continuum

Nezami, Zeinab, Chaniotakis, Emmanouil, Pournaras, Evangelos

arXiv.org Artificial IntelligenceMay-5-2024

The transformation of smart mobility is unprecedented--Autonomous, shared and electric connected vehicles, along with the urgent need to meet ambitious net-zero targets by shifting to low-carbon transport modalities result in new traffic patterns and requirements for real-time computation at large-scale, for instance, augmented reality applications. The cloud computing paradigm can neither respond to such low-latency requirements nor adapt resource allocation to such dynamic spatio-temporal service requests. This paper addresses this grand challenge by introducing a novel decentralized optimization framework for mobility-aware edge-to-cloud resource allocation, service offloading, provisioning and load-balancing. In contrast to related work, this framework comes with superior efficiency and cost-effectiveness under evaluation in real-world traffic settings and mobility datasets. This breakthrough capability of 'computing follows vehicles' proves able to reduce utilization variance by more than 40 times, while preventing service deadline violations by 14%-34%.

consumption, node, vehicle, (16 more...)

arXiv.org Artificial Intelligence

2404.13179

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Europe > United Kingdom > England > West Yorkshire > Leeds (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Telecommunications (1.00)
Information Technology > Services (1.00)
(3 more...)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Information Management (1.00)
Information Technology > Communications > Networks (1.00)
(5 more...)

Add feedback

MerA: Merging Pretrained Adapters For Few-Shot Learning

He, Shwai, Fan, Run-Ze, Ding, Liang, Shen, Li, Zhou, Tianyi, Tao, Dacheng

arXiv.org Artificial IntelligenceAug-30-2023

Adapter tuning, which updates only a few parameters, has become a mainstream method for fine-tuning pretrained language models to downstream tasks. However, it often yields subpar results in few-shot learning. AdapterFusion, which assembles pretrained adapters using composition layers tailored to specific tasks, is a possible solution but significantly increases trainable parameters and deployment costs. Despite this, our preliminary study reveals that even single adapters can outperform Adapterfusion in few-shot learning, urging us to propose \textbf{\texttt{Merging Pretrained Adapters}} (MerA) that efficiently incorporates pretrained adapters to a single model through model fusion. Extensive experiments on two PLMs demonstrate that MerA achieves substantial improvements compared to both single adapters and AdapterFusion. To further enhance the capacity of MerA, we also introduce a simple yet effective technique, referred to as the "\textit{same-track}" setting, that merges adapters from the same track of pretraining tasks. With the implementation of the "\textit{same-track}" setting, we observe even more impressive gains, surpassing the performance of both full fine-tuning and adapter tuning by a substantial margin, e.g., 3.5\% in MRPC and 5.0\% in MNLI.

few-shot learning, mera, merging pretrained adapter

arXiv.org Artificial Intelligence

2308.15982

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

Classical versus Quantum: comparing Tensor Network-based Quantum Circuits on LHC data

Araz, Jack Y., Spannowsky, Michael

arXiv.org Artificial IntelligenceNov-2-2022

Tensor Networks (TN) are approximations of high-dimensional tensors designed to represent locally entangled quantum many-body systems efficiently. This study provides a comprehensive comparison between classical TNs and TN-inspired quantum circuits in the context of Machine Learning on highly complex, simulated LHC data. We show that classical TNs require exponentially large bond dimensions and higher Hilbert-space mapping to perform comparably to their quantum counterparts. While such an expansion in the dimensionality allows better performance, we observe that, with increased dimensionality, classical TNs lead to a highly flat loss landscape, rendering the usage of gradient-based optimization methods highly challenging. Furthermore, by employing quantitative metrics, such as the Fisher information and effective dimensions, we show that classical TNs require a more extensive training sample to represent the data as efficiently as TN-inspired quantum circuits. We also engage with the idea of hybrid classical-quantum TNs and show possible architectures to employ a larger phase-space from the data. We offer our results using three main TN ansatz: Tree Tensor Networks, Matrix Product States, and Multi-scale Entanglement Renormalisation Ansatz.

architecture, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1103/PhysRevA.106.062423

2202.10471

Country:

South America > Ecuador > Pichincha Province > Quito (0.04)
South America > Colombia > Bogotá D.C. > Bogotá (0.04)
Europe > United Kingdom (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology (0.46)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
(2 more...)

Add feedback

A Multi-Scale Tensor Network Architecture for Classification and Regression

Reyes, Justin, Stoudenmire, Miles

arXiv.org Machine LearningJan-22-2020

A Multi-Scale T ensor Network Architecture for Classification and Regression Justin Reyes 1 and E. Miles Stoudenmire 2 1 Department of Physics, University of Central Florida, 4000 Central Florida Blvd, Orlando, FL 32816, USA 2 Center for Computational Quantum Physics, Flatiron Institute, 162 5th Avenue, New Y ork, NY 10010, USA (Dated: January 24, 2020) We present an algorithm for supervised learning using tensor networks, employing a step of preprocessing the data by coarse-graining through a sequence of wavelet transformations. We represent these transformations as a set of tensor network layers identical to those in a multi-scale entanglement renormalization ansatz (MERA) tensor network, and perform supervised learning and regression tasks through a model based on a matrix product state (MPS) tensor network acting on the coarse-grained data. Because the entire model consists of tensor contractions (apart from the initial nonlinear feature map), we can adaptively fine-grain the optimized MPS model backwards through the layers with essentially no loss in performance. The MPS itself is trained using an adaptive algorithm based on the density matrix renormalization group (DMRG) algorithm. We test our methods by performing a classification task on audio data and a regression task on temperature time-series data, studying the dependence of training accuracy on the number of coarse-graining layers and showing how fine-graining through the network may be used to initialize models with access to finer-scale features. I. INTRODUCTION Computational techniques developed across the machine learning and physics fields have consistently generated promising methods and applications in both areas of study. The application of well established machine learning architectures and optimization techniques has enriched the physics community with advances such as modeling and recognizing topological quantum states [1-3], optimizing quantum error correction codes [4], or classifying quantum walks [5]. Conversely, techniques known as tensor networks which model high-dimensional functions and are closely connected to physical principles have begun to be explored more in applied mathematics and machine learning [6-16].

tensor, tensor network, transformation, (16 more...)

arXiv.org Machine Learning

2001.08286

Country: North America > United States > Florida > Orange County > Orlando (0.24)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.70)

Add feedback

Number-State Preserving Tensor Networks as Classifiers for Supervised Learning

Evenbly, Glen

arXiv.org Machine LearningMay-15-2019

We propose a restricted class of tensor network state, built from number-state preserving tensors, for supervised learning tasks. This class of tensor network is argued to be a natural choice for classifiers as (i) they map classical data to classical data, and thus preserve the interpretability of data under tensor transformations, (ii) they can be efficiently trained to maximize their scalar product against classical data sets, and (iii) they seem to be as powerful as generic (unrestricted) tensor networks in this task. Our proposal is demonstrated using a variety of benchmark classification problems, where number-state preserving versions of commonly used networks (including MPS, TTN and MERA) are trained as effective classifiers. This work opens the path for powerful tensor network methods such as MERA, which were previously computationally intractable as classifiers, to be employed for difficult tasks such as image recognition.

artificial intelligence, machine learning, tensor, (19 more...)

arXiv.org Machine Learning

1905.06352

Country:

North America > United States > New York (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Making Pepper walk: Understanding Softbank's purchase of Boston Dynamics

RobohubJun-12-2017, 11:40:07 GMT

It is unclear if Masayoshi Son, Chairman of Softbank, was one of the 17 million YouTube viewers of Boston Dynamic's Big Dog before acquiring the company for an undisclosed amount this past Thursday. What is clear is the acquisition of Boston Dynamics by Softbank is a big deal. Softbank's humanoid robot Pepper is trading up her dainty wheels for a pair of sturdy legs. In expressing his excitement for the acquisition, Masayoshi Son said, "Today, there are many issues we still cannot solve by ourselves with human capabilities. Smart robotics are going to be a key driver of the next stage of the Information Revolution, and Marc and his team at Boston Dynamics are the clear technology leaders in advanced dynamic robots. I am thrilled to welcome them to the SoftBank family and look forward to supporting them as they continue to advance the field of robotics and explore applications that can help make life easier, safer and more fulfilling."

artificial intelligence, boston dynamic, softbank, (16 more...)

Robohub

Country: