necessity
Triangulation as an Acceptance Rule for Multilingual Mechanistic Interpretability
Multilingual language models achieve strong aggregate performance yet often behave unpredictably across languages, scripts, and cultures. We argue that mechanistic explanations for such models should satisfy a \emph{causal} standard: claims must survive causal interventions and must \emph{cross-reference} across environments that perturb surface form while preserving meaning. We formalize \emph{reference families} as predicate-preserving variants and introduce \emph{triangulation}, an acceptance rule requiring necessity (ablating the circuit degrades the target behavior), sufficiency (patching activations transfers the behavior), and invariance (both effects remain directionally stable and of sufficient magnitude across the reference family). To supply candidate subgraphs, we adopt automatic circuit discovery and \emph{accept or reject} those candidates by triangulation. We ground triangulation in causal abstraction by casting it as an approximate transformation score over a distribution of interchange interventions, connect it to the pragmatic interpretability agenda, and present a comparative experimental protocol across multiple model families, language pairs, and tasks. Triangulation provides a falsifiable standard for mechanistic claims that filters spurious circuits passing single-environment tests but failing cross-lingual invariance.
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
On the Necessity of Collaboration for Online Model Selection with Decentralized Data
We consider online model selection with decentralized data over $M$ clients, and study the necessity of collaboration among clients. Previous work proposed various federated algorithms without demonstrating their necessity, while we answer the question from a novel perspective of computational constraints. We prove lower bounds on the regret, and propose a federated algorithm and analyze the upper bound. Our results show (i) collaboration is unnecessary in the absence of computational constraints on clients; (ii) collaboration is necessary if the computational cost on each client is limited to $o(K)$, where $K$ is the number of candidate hypothesis spaces. We clarify the unnecessary nature of collaboration in previous federated algorithms for distributed online multi-kernel learning, and improve the regret bounds at a smaller computational and communication cost. Our algorithm relies on three new techniques including an improved Bernstein's inequality for martingale, a federated online mirror descent framework, and decoupling model selection and prediction, which might be of independent interest.
Ferrari: Federated Feature Unlearning via Optimizing Feature Sensitivity
The advent of Federated Learning (FL) highlights the practical necessity for the'right to be forgotten' for all clients, allowing them to request data deletion from the machine learning model's service provider. This necessity has spurred a growing demand for Federated Unlearning (FU). Feature unlearning has gained considerable attention due to its applications in unlearning sensitive, backdoor, and biased features. Existing methods employ the influence function to achieve feature unlearning, which is impractical for FL as it necessitates the participation of other clients, if not all, in the unlearning process. Furthermore, current research lacks an evaluation of the effectiveness of feature unlearning. To address these limitations, we define feature sensitivity in evaluating feature unlearning according to Lipschitz continuity. This metric characterizes the model output's rate of change or sensitivity to perturbations in the input feature. We then propose an effective federated feature unlearning framework called Ferrari, which minimizes feature sensitivity. Extensive experimental results and theoretical analysis demonstrate the effectiveness of Ferrari across various feature unlearning scenarios, including sensitive, backdoor, and biased features.
A Broader Impact As brand-new models, the vulnerability of ViTs to adversarial samples motivates us to upgrade their
Our approaches may contribute to a safer use of ViTs in the real world. We have shown the necessity of gradient clipping (GC) for ViTs in Section 3.2. In this section, we evaluate the proposed method on ImageNet-1K, the most commonly used large-scale dataset. We apply the most popular threat model on ImageNet-1K, i.e., setting the perturbation PGD-5 with the step size 2/ 255 to craft adversarial examples on the fly during training. In Table 7, our method improves both natural accuracy and robustness by notable margins.
Causal Sufficiency and Necessity Improves Chain-of-Thought Reasoning
Yu, Xiangning, Wang, Zhuohan, Yang, Linyi, Li, Haoxuan, Liu, Anjie, Xue, Xiao, Wang, Jun, Yang, Mengyue
Chain-of-Thought (CoT) prompting plays an indispensable role in endowing large language models (LLMs) with complex reasoning capabilities. However, CoT currently faces two fundamental challenges: (1) Sufficiency, which ensures that the generated intermediate inference steps comprehensively cover and substantiate the final conclusion; and (2) Necessity, which identifies the inference steps that are truly indispensable for the soundness of the resulting answer. We propose a causal framework that characterizes CoT reasoning through the dual lenses of sufficiency and necessity. Incorporating causal Probability of Sufficiency and Necessity allows us not only to determine which steps are logically sufficient or necessary to the prediction outcome, but also to quantify their actual influence on the final reasoning outcome under different intervention scenarios, thereby enabling the automated addition of missing steps and the pruning of redundant ones. Extensive experimental results on various mathematical and commonsense reasoning benchmarks confirm substantial improvements in reasoning efficiency and reduced token usage without sacrificing accuracy. Our work provides a promising direction for improving LLM reasoning performance and cost-effectiveness.
- Asia > China > Tianjin Province > Tianjin (0.04)
- Asia > China > Hong Kong (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
WebGraphEval: Multi-Turn Trajectory Evaluation for Web Agents using Graph Representation
Qian, Yaoyao, Wang, Yuanli, Zhang, Jinda, Zong, Yun, Chen, Meixu, Zhou, Hanhan, Huang, Jindan, Zeng, Yifan, Hu, Xinyu, Song, Chan Hee, Zhang, Danqing
Current evaluation of web agents largely reduces to binary success metrics or conformity to a single reference trajectory, ignoring the structural diversity present in benchmark datasets. We present WebGraphEval, a framework that abstracts trajectories from multiple agents into a unified, weighted action graph. This representation is directly compatible with benchmarks such as WebArena, leveraging leaderboard runs and newly collected trajectories without modifying environments. The framework canonically encodes actions, merges recurring behaviors, and applies structural analyses including reward propagation and success-weighted edge statistics. Evaluations across thousands of trajectories from six web agents show that the graph abstraction captures cross-model regularities, highlights redundancy and inefficiency, and identifies critical decision points overlooked by outcome-based metrics. By framing web interaction as graph-structured data, WebGraphEval establishes a general methodology for multi-path, cross-agent, and efficiency-aware evaluation of web agents.
- North America > United States > Texas (0.04)
- North America > United States > Oregon (0.04)
- North America > United States > Ohio (0.04)
- North America > United States > Minnesota (0.04)
Direct Preference Optimization with Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences
Chidambaram, Keertana, Seetharaman, Karthik Vinary, Syrgkanis, Vasilis
Reinforcement Learning from Human Feedback (RLHF) has become central to aligning large language models with human values, typically by first learning a reward model from preference data which is then used to update the model with reinforcement learning. Recent alternatives such as Direct Preference Optimization (DPO) simplify this pipeline by directly optimizing on preferences. However, both approaches often assume uniform annotator preferences and rely on binary comparisons, overlooking two key limitations: the diversity of human evaluators and the limitations of pairwise feedback. In this work, we address both these issues. First, we connect preference learning in RLHF with the econometrics literature and show that binary comparisons are insufficient for identifying latent user preferences from finite user data and infinite users, while (even incomplete) rankings over three or more responses ensure identifiability. Second, we introduce methods to incorporate heterogeneous preferences into alignment algorithms. We develop an Expectation-Maximization adaptation of DPO that discovers latent annotator types and trains a mixture of LLMs accordingly. Then we propose an aggregation algorithm using a min-max regret fairness criterion to produce a single generative policy with equitable performance guarantees. Together, these contributions establish a theoretical and algorithmic framework for fairness and personalization for diverse users in generative model alignment.
- North America > Mexico (0.04)
- Asia > Pakistan (0.04)
- Asia > Indonesia (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
The Algorithmic Regulator
The regulator theorem states that, under certain conditions, any optimal controller must embody a model of the system it regulates, grounding the idea that controllers embed, explicitly or implicitly, internal models of the controlled. This principle underpins neuroscience and predictive brain theories like the Free-Energy Principle or Kolmogorov/Algorithmic Agent theory. However, the theorem is only proven in limited settings. Here, we treat the deterministic, closed, coupled world-regulator system $(W,R)$ as a single self-delimiting program $p$ via a constant-size wrapper that produces the world output string~$x$ fed to the regulator. We analyze regulation from the viewpoint of the algorithmic complexity of the output, $K(x)$. We define $R$ to be a \emph{good algorithmic regulator} if it \emph{reduces} the algorithmic complexity of the readout relative to a null (unregulated) baseline $\varnothing$, i.e., \[ Δ= K\big(O_{W,\varnothing}\big) - K\big(O_{W,R}\big) > 0. \] We then prove that the larger $Δ$ is, the more world-regulator pairs with high mutual algorithmic information are favored. More precisely, a complexity gap $Δ> 0$ yields \[ \Pr\big((W,R)\mid x\big) \le C\,2^{\,M(W{:}R)}\,2^{-Δ}, \] making low $M(W{:}R)$ exponentially unlikely as $Δ$ grows. This is an AIT version of the idea that ``the regulator contains a model of the world.'' The framework is distribution-free, applies to individual sequences, and complements the Internal Model Principle. Beyond this necessity claim, the same coding-theorem calculus singles out a \emph{canonical scalar objective} and implicates a \emph{planner}. On the realized episode, a regulator behaves \emph{as if} it minimized the conditional description length of the readout.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (4 more...)
Gender Bias in Large Language Models for Healthcare: Assignment Consistency and Clinical Implications
Liu, Mingxuan, Ke, Yuhe, Zhu, Wentao, Mertens, Mayli, Ning, Yilin, Liao, Jingchi, Hong, Chuan, Ting, Daniel Shu Wei, Peng, Yifan, Bitterman, Danielle S., Ong, Marcus Eng Hock, Liu, Nan
The integration of large language models (LLMs) into healthcare holds promise to enhance clinical decision-making, yet their susceptibility to biases remains a critical concern. Gender has long influenced physician behaviors and patient outcomes, raising concerns that LLMs assuming human-like roles, such as clinicians or medical educators, may replicate or amplify gender-related biases. Using case studies from the New England Journal of Medicine Challenge (NEJM), we assigned genders (female, male, or unspecified) to multiple open-source and proprietary LLMs. We evaluated their response consistency across LLM-gender assignments regarding both LLM-based diagnosis and models' judgments on the clinical relevance or necessity of patient gender. In our findings, diagnoses were relatively consistent across LLM genders for most models. However, for patient gender's relevance and necessity in LLM-based diagnosis, all models demonstrated substantial inconsistency across LLM genders, particularly for relevance judgements. Some models even displayed a systematic female-male disparity in their interpretation of patient gender. These findings present an underexplored bias that could undermine the reliability of LLMs in clinical practice, underscoring the need for routine checks of identity-assignment consistency when interacting with LLMs to ensure reliable and equitable AI-supported clinical care.
- Asia > Singapore > Central Region > Singapore (0.05)
- Europe > Belgium > Flanders > Antwerp Province > Antwerp (0.05)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)