mira
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Unsupervised Visual Representation Learning via Mutual Information Regularized Assignment
This paper proposes Mutual Information Regularized Assignment (MIRA), a pseudo-labeling algorithm for unsupervised representation learning inspired by information maximization. We formulate online pseudo-labeling as an optimization problem to find pseudo-labels that maximize the mutual information between the label and data while being close to a given model probability. We derive a fixed-point iteration method and prove its convergence to the optimal solution. In contrast to baselines, MIRA combined with pseudo-label prediction enables a simple yet effective clustering-based representation learning without incorporating extra training techniques or artificial constraints such as sampling strategy, equipartition constraints, etc. With relatively small training epochs, representation learned by MIRA achieves state-of-the-art performance on various downstream tasks, including the linear/${\it k}$-NN evaluation and transfer learning. Especially, with only 400 epochs, our method applied to ImageNet dataset with ResNet-50 architecture achieves 75.6% linear evaluation accuracy.
Memory-Integrated Reconfigurable Adapters: A Unified Framework for Settings with Multiple Tasks
Agrawal, Susmit, Kher, Krishn Vishwas, Mittal, Saksham, Maheshwari, Swarnim, Balasubramanian, Vineeth N.
Organisms constantly pivot between tasks such as evading predators, foraging, traversing rugged terrain, and socializing, often within milliseconds. Remarkably, they preserve knowledge of once-learned environments sans catastrophic forgetting, a phenomenon neuroscientists hypothesize, is due to a singular neural circuitry dynamically overlayed by neuromodulatory agents such as dopamine and acetylcholine. In parallel, deep learning research addresses analogous challenges via domain generalization (DG) and continual learning (CL), yet these methods remain siloed, despite the brains ability to perform them seamlessly. In particular, prior work has not explored architectures involving associative memories (AMs), which are an integral part of biological systems, to jointly address these tasks. We propose Memory-Integrated Reconfigurable Adapters (MIRA), a unified framework that integrates Hopfield-style associative memory modules atop a shared backbone. Associative memory keys are learned post-hoc to index and retrieve an affine combination of stored adapter updates for any given task or domain on a per-sample basis. By varying only the task-specific objectives, we demonstrate that MIRA seamlessly accommodates domain shifts and sequential task exposures under one roof. Empirical evaluations on standard benchmarks confirm that our AM-augmented architecture significantly enhances adaptability and retention: in DG, MIRA achieves SoTA out-of-distribution accuracy, and in incremental learning settings, it outperforms architectures explicitly designed to handle catastrophic forgetting using generic CL algorithms. By unifying adapter-based modulation with biologically inspired associative memory, MIRA delivers rapid task switching and enduring knowledge retention in a single extensible architecture, charting a path toward more versatile and memory-augmented AI systems.
- North America > United States (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- (4 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
Large Language Models Have Intrinsic Meta-Cognition, but Need a Good Lens
Ma, Ziyang, Yuan, Qingyue, Wang, Zhenglin, Zhou, Deyu
Previous research has primarily focused on the cognitive error detection capabilities of Large Language Models (LLMs), often prompting them to analyze mistakes in reasoning chains. However, few studies have examined the meta-cognitive abilities of LLMs (e.g., their self-awareness of step errors), which are crucial for their reliability. While studies on LLM self-evaluation present some measures, such as perplexity, which can reflect the answer correctness and be viewed as the lens of meta-cognition, they lack step-level analysis and adaptation. This paper studies the evaluation of LLM meta-cognition using the current lenses and how to improve these lenses. Specifically, we propose AutoMeco, an Automated Meta-cognition Evaluation framework for benchmarking the existing lenses. Furthermore, a training-free Markovian Intrinsic Reward Adjustment strategy, MIRA, is proposed to boost current meta-cognition lenses. Experimental results on three mathematical reasoning datasets and three LLMs show the reasonableness of AutoMeco by comparing it with Best-of-N verification. Moreover, the meta-cognition ability of LLMs can be better evaluated using MIRA.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
MIRA: Towards Mitigating Reward Hacking in Inference-Time Alignment of T2I Diffusion Models
Zhai, Kevin, Singh, Utsav, Thatipelli, Anirudh, Chakraborty, Souradip, Sahu, Anit Kumar, Huang, Furong, Bedi, Amrit Singh, Shah, Mubarak
Diffusion models excel at generating images conditioned on text prompts, but the resulting images often do not satisfy user-specific criteria measured by scalar rewards such as Aesthetic Scores. This alignment typically requires fine-tuning, which is computationally demanding. Recently, inference-time alignment via noise optimization has emerged as an efficient alternative, modifying initial input noise to steer the diffusion denoising process towards generating high-reward images. However, this approach suffers from reward hacking, where the model produces images that score highly, yet deviate significantly from the original prompt. We show that noise-space regularization is insufficient and that preventing reward hacking requires an explicit image-space constraint. To this end, we propose MIRA (MItigating Reward hAcking), a training-free, inference-time alignment method. MIRA introduces an image-space, score-based KL surrogate that regularizes the sampling trajectory with a frozen backbone, constraining the output distribution so reward can increase without off-distribution drift (reward hacking). We derive a tractable approximation to KL using diffusion scores. Across SDv1.5 and SDXL, multiple rewards (Aesthetic, HPSv2, PickScore), and public datasets (e.g., Animal-Animal, HPDv2), MIRA achieves >60\% win rate vs. strong baselines while preserving prompt adherence; mechanism plots show reward gains with near-zero drift, whereas DNO drifts as compute increases. We further introduce MIRA-DPO, mapping preference optimization to inference time with a frozen backbone, extending MIRA to non-differentiable rewards without fine-tuning.
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East > Republic of Türkiye (0.04)
MIRA: Empowering One-Touch AI Services on Smartphones with MLLM-based Instruction Recommendation
Bian, Zhipeng, Zhu, Jieming, Xie, Xuyang, Dai, Quanyu, Zhao, Zhou, Dong, Zhenhua
The rapid advancement of generative AI technologies is driving the integration of diverse AI-powered services into smartphones, transforming how users interact with their devices. To simplify access to predefined AI services, this paper introduces MIRA, a pioneering framework for task instruction recommendation that enables intuitive one-touch AI tasking on smartphones. With MIRA, users can long-press on images or text objects to receive contextually relevant instruction recommendations for executing AI tasks. Our work introduces three key innovations: 1) A multimodal large language model (MLLM)-based recommendation pipeline with structured reasoning to extract key entities, infer user intent, and generate precise instructions; 2) A template-augmented reasoning mechanism that integrates high-level reasoning templates, enhancing task inference accuracy; 3) A prefix-tree-based constrained decoding strategy that restricts outputs to predefined instruction candidates, ensuring coherent and intent-aligned suggestions. Through evaluation using a real-world annotated datasets and a user study, MIRA has demonstrated substantial improvements in the accuracy of instruction recommendation. The encouraging results highlight MIRA's potential to revolutionize the way users engage with AI services on their smartphones, offering a more seamless and efficient experience.
- Asia > China > Guangdong Province > Shenzhen (0.05)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Research Report (0.82)
- Overview (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Unsupervised Visual Representation Learning via Mutual Information Regularized Assignment
This paper proposes Mutual Information Regularized Assignment (MIRA), a pseudo-labeling algorithm for unsupervised representation learning inspired by information maximization. We formulate online pseudo-labeling as an optimization problem to find pseudo-labels that maximize the mutual information between the label and data while being close to a given model probability. We derive a fixed-point iteration method and prove its convergence to the optimal solution. In contrast to baselines, MIRA combined with pseudo-label prediction enables a simple yet effective clustering-based representation learning without incorporating extra training techniques or artificial constraints such as sampling strategy, equipartition constraints, etc. With relatively small training epochs, representation learned by MIRA achieves state-of-the-art performance on various downstream tasks, including the linear/ {\it k} -NN evaluation and transfer learning.
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization
Behrouz, Ali, Razaviyayn, Meisam, Zhong, Peilin, Mirrokni, Vahab
Designing efficient and effective architectural backbones has been in the core of research efforts to enhance the capability of foundation models. Inspired by the human cognitive phenomenon of attentional bias-the natural tendency to prioritize certain events or stimuli-we reconceptualize neural architectures, including Transformers, Titans, and modern linear recurrent neural networks as associative memory modules that learn a mapping of keys and values using an internal objective, referred to as attentional bias. Surprisingly, we observed that most existing sequence models leverage either (1) dot-product similarity, or (2) L2 regression objectives as their attentional bias. Going beyond these objectives, we present a set of alternative attentional bias configurations along with their effective approximations to stabilize their training procedure. We then reinterpret forgetting mechanisms in modern deep learning architectures as a form of retention regularization, providing a novel set of forget gates for sequence models. Building upon these insights, we present Miras, a general framework to design deep learning architectures based on four choices of: (i) associative memory architecture, (ii) attentional bias objective, (iii) retention gate, and (iv) memory learning algorithm. We present three novel sequence models-Moneta, Yaad, and Memora-that go beyond the power of existing linear RNNs while maintaining a fast parallelizable training process. Our experiments show different design choices in Miras yield models with varying strengths. For example, certain instances of Miras achieve exceptional performance in special tasks such as language modeling, commonsense reasoning, and recall intensive tasks, even outperforming Transformers and other modern linear recurrent models.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
- Education (0.46)
- Government > Regional Government (0.46)
- Health & Medicine (0.45)
Pixel-Level Reasoning Segmentation via Multi-turn Conversations
Cai, Dexian, Yang, Xiaocui, Liu, Yongkang, Wang, Daling, Feng, Shi, Zhang, Yifei, Poria, Soujanya
Existing visual perception systems focus on region-level segmentation in single-turn dialogues, relying on complex and explicit query instructions. Such systems cannot reason at the pixel level and comprehend dynamic user intent that changes over interaction. Our work tackles this issue by introducing a novel task, Pixel-level Reasoning Segmentation (Pixel-level RS) based on multi-turn conversations, tracking evolving user intent via multi-turn interactions for fine-grained segmentation. To establish a benchmark for this novel task, we build a Pixel-level ReasonIng Segmentation Dataset Based on Multi-Turn Conversations (PRIST), comprising 24k utterances from 8.3k multi-turn conversational scenarios with segmentation targets. Building on PRIST, we further propose MIRAS, a Multi-turn Interactive ReAsoning Segmentation framework, integrates pixel-level segmentation with robust multi-turn conversation understanding, generating pixel-grounded explanations aligned with user intent. The PRIST dataset and MIRSA framework fill the gap in pixel-level reasoning segmentation. Experimental results on the PRIST dataset demonstrate that our method outperforms current segmentation-specific baselines in terms of segmentation and LLM-based reasoning metrics. The code and data are available at: https://github.com/ccccai239/PixelRIST.
- Asia > Singapore (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
- Asia > China > Liaoning Province > Shenyang (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)