cama
Make LVLMs Focus: Context-Aware Attention Modulation for Better Multimodal In-Context Learning
Li, Yanshu, Yang, Jianjiang, Yang, Ziteng, Li, Bozheng, Han, Ligong, He, Hongyang, Yao, Zhengtao, Chen, Yingjie Victor, Fei, Songlin, Liu, Dongfang, Tang, Ruixiang
Multimodal in-context learning (ICL) is becoming a key capability that allows large vision-language models (L VLMs) to adapt to novel tasks without parameter updates, which expands their usefulness in many real-world applications. However, ICL performance remains unstable even when the in-context demonstrations (ICDs) are well matched, showing that L VLMs still struggle to make full use of the provided context. While existing work mainly focuses on prompt engineering or post-hoc logit calibration, we study the attention mechanisms inside L VLMs to address their inherent limitations. We identify two important weaknesses in their self-attention that hinder effective ICL. T o address these weaknesses, we propose Context-Aware Modulated Attention (CAMA), a training-free and plug-and-play method that dynamically adjusts attention logits based on the input in-context sequence. CAMA uses a two-stage modulation process that strengthens attention to semantically important tokens, especially visual ones. Across four L VLMs and seven benchmarks, CAMA consistently outperforms vanilla models and baselines, showing clear effectiveness and generalization. It can also activate the intended benefits of prompt engineering methods and remains robust across different sequence configurations. Therefore, CAMA opens up new directions for improving multimodal reasoning through a deeper understanding of attention dynamics.
CAMA: Enhancing Mathematical Reasoning in Large Language Models with Causal Knowledge
Zan, Lei, Zhang, Keli, Cai, Ruichu, Pan, Lujia
Large Language Models (LLMs) have demonstrated strong performance across a wide range of tasks, yet they still struggle with complex mathematical reasoning, a challenge fundamentally rooted in deep structural dependencies. To address this challenge, we propose \textbf{CA}usal \textbf{MA}thematician (\textbf{CAMA}), a two-stage causal framework that equips LLMs with explicit, reusable mathematical structure. In the learning stage, CAMA first constructs the \textbf{M}athematical \textbf{C}ausal \textbf{G}raph (\textbf{MCG}), a high-level representation of solution strategies, by combining LLM priors with causal discovery algorithms applied to a corpus of question-solution pairs. The resulting MCG encodes essential knowledge points and their causal dependencies. To better align the graph with downstream reasoning tasks, CAMA further refines the MCG through iterative feedback derived from a selected subset of the question-solution pairs. In the reasoning stage, given a new question, CAMA dynamically extracts a task-relevant subgraph from the MCG, conditioned on both the question content and the LLM's intermediate reasoning trace. This subgraph, which encodes the most pertinent knowledge points and their causal dependencies, is then injected back into the LLM to guide its reasoning process. Empirical results on real-world datasets show that CAMA significantly improves LLM performance on challenging mathematical problems. Furthermore, our experiments demonstrate that structured guidance consistently outperforms unstructured alternatives, and that incorporating asymmetric causal relationships yields greater improvements than using symmetric associations alone.
Frida Kahlo self-portrait sells for 55m, sets auction record for a female artist
A surrealist painting from the 1940s by Frida Kahlo has sold for $54.7m (ยฃ41.8m) - shattering the auction record for an artwork by a female artist. The painting went for more than 1,000 times its original auction price in 1980, after a tense bidding battle between two collectors, according to the Sotheby's auction house. The auction also broke the previous record for the highest amount paid for a Kahlo portrait, which sold for $34.9 million in 2021. The work - titled El sueรฑo (la cama), which is translated to The dream (The bed) - depicts Kahlo asleep in a canopy bed beneath a skeleton entwined with dynamite. It marks one of the Mexican artist's most psychologically charged self portraits, Sotheby's said, and was painted during a turbulent chapter in Kahlo's life - the year her former lover was assassinated and shortly after her divorce and remarriage.
A Derivation Details
ELBO objective (3) presented in the main text. Firstly, the latent variables have very different meanings. Another important contribution of the paper is the generalization of deep CAMA to generic measurement data. We also performed experiments using different DNN network architectures. Figure 15 shows the performance against different shifts.
addition, we will make the code publicly available, together with the paper
We thank the reviewers for their time and insightful comments on our paper. We respond to each reviewer(R) below. R1's comments cover only part of contribution (2), R3 also pointed out "The proposed fine-tuning phase to learn unseen M R1.decomposition necessity and CV AE comparisons: We emphasise that CV AE and other mentioned work can R2. Comparisons to IRM: First, IRM only considers single modality data. R3. (adversarial) data augmentation: Deep CAMA also benefits from adversarial training (Figure 10).
Feature Engineering for Agents: An Adaptive Cognitive Architecture for Interpretable ML Monitoring
Bravo-Rocca, Gusseppe, Liu, Peini, Guitart, Jordi, Carrillo-Larco, Rodrigo M, Dholakia, Ajay, Ellison, David
Monitoring Machine Learning (ML) models in production environments is crucial, yet traditional approaches often yield verbose, low-interpretability outputs that hinder effective decision-making. We propose a cognitive architecture for ML monitoring that applies feature engineering principles to agents based on Large Language Models (LLMs), significantly enhancing the interpretability of monitoring outputs. Central to our approach is a Decision Procedure module that simulates feature engineering through three key steps: Refactor, Break Down, and Compile. The Refactor step improves data representation to better capture feature semantics, allowing the LLM to focus on salient aspects of the monitoring data while reducing noise and irrelevant information. Break Down decomposes complex information for detailed analysis, and Compile integrates sub-insights into clear, interpretable outputs. This process leads to a more deterministic planning approach, reducing dependence on LLM-generated planning, which can sometimes be inconsistent and overly general. The combination of feature engineering-driven planning and selective LLM utilization results in a robust decision support system, capable of providing highly interpretable and actionable insights. Experiments using multiple LLMs demonstrate the efficacy of our approach, achieving significantly higher accuracy compared to various baselines across several domains.
Online Continual Learning For Interactive Instruction Following Agents
Kim, Byeonghwi, Seo, Minhyuk, Choi, Jonghyun
In learning an embodied agent executing daily tasks via language directives, the literature largely assumes that the agent learns all training data at the beginning. We argue that such a learning scenario is less realistic since a robotic agent is supposed to learn the world continuously as it explores and perceives it. To take a step towards a more realistic embodied agent learning scenario, we propose two continual learning setups for embodied agents; learning new behaviors (Behavior Incremental Learning, Behavior-IL) and new environments (Environment Incremental Learning, Environment-IL) For the tasks, previous'data prior' based continual learning methods maintain logits for the past tasks. However, the stored information is often insufficiently learned information and requires task boundary information, which might not always be available. Here, we propose to update them based on confidence scores without task boundary information during training (i.e., task-free) in a moving average fashion, named Confidence-Aware Moving Average (CAMA). In the proposed Behavior-IL and Environment-IL setups, our simple CAMA outperforms prior state of the art in our empirical validations by noticeable margins. To create more realistic agents, challenging benchmarks (Shridhar et al., 2020; Padmakumar et al., 2022) require all of these tasks to complete complex tasks based on language directives. However, most embodied AI literature assumes that all training data are available from the outset but it may be unrealistic as agents may encounter novel behaviors or environments after deployment. To learn new behaviors and environments, continual learning might be necessary for post-deployment. To learn new tasks, one may finetune the agents. But the finetuned agents would suffer from catastrophic forgetting that loses previously learned knowledge (McCloskey & Cohen, 1989; Ratcliff, 1990). To mitigate such forgetting, (Powers et al., 2022) introduced a continual reinforcement learning framework that incrementally updates agents for new tasks and evaluates their knowledge of current and past tasks. However, this operates in a simplified task setup of (Shridhar et al., 2020), excluding natural language understanding and object localization.
Enhancing Multi-field B2B Cloud Solution Matching via Contrastive Pre-training
Chen, Haonan, Dou, Zhicheng, Hao, Xuetong, Tao, Yunhao, Song, Shiren, Sheng, Zhenli
Cloud solutions have gained significant popularity in the technology While there have been some studies focusing on designing effective industry as they offer a combination of services and tools to matching systems [1, 18, 20, 23, 29, 32, 35], none of these tackle specific problems. However, despite their widespread use, the works have explored the matching of cloud solutions and their customers, task of identifying appropriate company customers for a specific which holds significant business value. In Huawei Cloud, target solution to the sales team of a solution provider remains a the scenario is manual-driven, wherein our model identifies a list complex business problem that existing matching systems have of the top matching companies to the sales team associated with yet to adequately address. In this work, we study the B2B solution a specific solution. The sales team then manually reviews this list matching problem and identify two main challenges of this scenario: and proceeds with promoting the solution to those companies. This (1) the modeling of complex multi-field features and (2) the limited, specific scenario can be considered a matching problem, with the incomplete, and sparse transaction data. To tackle these challenges, primary goal being the identification of appropriate companies we propose a framework CAMA, which is built with a hierarchical (customers) for the sales teams to target in their promotion efforts.