distr
Reason-KE++: Aligning the Process, Not Just the Outcome, for Faithful LLM Knowledge Editing
Wu, Yuchen, Ding, Liang, Shen, Li, Tao, Dacheng
Aligning Large Language Models (LLMs) to be faithful to new knowledge in complex, multi-hop reasoning tasks is a critical, yet unsolved, challenge. We find that SFT-based methods, e.g., Reason-KE, while state-of-the-art, suffer from a "faithfulness gap": they optimize for format mimicry rather than sound reasoning. This gap enables the LLM's powerful parametric priors to override new contextual facts, resulting in critical factual hallucinations (e.g., incorrectly reasoning "Houston" from "NASA" despite an explicit edit). To solve this core LLM alignment problem, we propose Reason-KE++, an SFT+RL framework that instills process-level faithfulness. Its core is a Stage-aware Reward mechanism that provides dense supervision for intermediate reasoning steps (e.g., Decomposition, Sub-answer Correctness). Crucially, we identify that naive outcome-only RL is a deceptive trap for LLM alignment: it collapses reasoning integrity (e.g., 19.00% Hop acc) while superficially boosting final accuracy. Our process-aware framework sets a new SOTA of 95.48% on MQUAKE-CF-3k (+5.28%), demonstrating that for complex tasks, aligning the reasoning process is essential for building trustworthy LLMs.
- Europe > Austria > Vienna (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (6 more...)
Consensus Sampling for Safer Generative AI
Kalai, Adam Tauman, Kalai, Yael Tauman, Zamir, Or
Many approaches to AI safety rely on inspecting model outputs or activations, yet certain risks are inherently undetectable by inspection alone. We propose a complementary, architecture-agnostic approach that enhances safety through the aggregation of multiple generative models, with the aggregated model inheriting its safety from the safest subset of a given size among them. Specifically, we present a consensus sampling algorithm that, given $k$ models and a prompt, achieves risk competitive with the average risk of the safest $s$ of the $k$ models, where $s$ is a chosen parameter, while abstaining when there is insufficient agreement between them. The approach leverages the models' ability to compute output probabilities, and we bound the probability of abstention when sufficiently many models are safe and exhibit adequate agreement. The algorithm is inspired by the provable copyright protection algorithm of Vyas et al. (2023). It requires some overlap among safe models, offers no protection when all models are unsafe, and may accumulate risk over repeated use. Nonetheless, our results provide a new, model-agnostic approach for AI safety by amplifying safety guarantees from an unknown subset of models within a collection to that of a single reliable model.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (4 more...)
Robust Knowledge Editing via Explicit Reasoning Chains for Distractor-Resilient Multi-Hop QA
Wu, Yuchen, Ding, Liang, Shen, Li, Tao, Dacheng
Large language models (LLMs) encode vast amounts of world knowledge but remain static once trained, making the timely integration of emerging facts prohibitively expensive via full retraining. Knowledge-editing techniques have thus emerged to inject or overwrite specific facts into LLMs, yet they either over-rely on superficial cues or incur complex, iterative pipelines that collapse under noisy, multi-hop conditions. We introduce Reason-KE, an end-to-end reasoning-chain-based editing framework that steers a pretrained LLM through four structured stages-fact acknowledgment, relevance determination, selective application, and final reasoning-to filter distractors in a single pass. Trained on MQuAKE-CF with up to four irrelevant facts, Reason-KE elevates Qwen2.5-7B's multi-hop QA accuracy to 90.2% while suffering merely a 6.3% drop under heavy distraction and <1% when answers are leaked. Our quantitative analysis confirms Reason-KE's resilience and efficiency, establishing a new state-of-the-art for reliable LLM knowledge updates.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Singapore (0.04)
- (2 more...)
A Proof of Theorem 4.1
We first state the following Lemma which we will use to prove Theorem 4.1. We prove this via induction. It's easy to see that this holds true at round First we state the following fact: 15 Fact B.1 Suppose the update requester is non-adaptive. We will prove another upper bound using the max-information bound. Bin (k,p) as we have shown in the first part of this theorem.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Colorado > Denver County > Denver (0.04)
Distributional Reduction: Unifying Dimensionality Reduction and Clustering with Gromov-Wasserstein Projection
Van Assel, Hugues, Vincent-Cuaz, Cédric, Courty, Nicolas, Flamary, Rémi, Frossard, Pascal, Vayer, Titouan
Unsupervised learning aims to capture the underlying structure of potentially large and high-dimensional datasets. Traditionally, this involves using dimensionality reduction methods to project data onto interpretable spaces or organizing points into meaningful clusters. In practice, these methods are used sequentially, without guaranteeing that the clustering aligns well with the conducted dimensionality reduction. In this work, we offer a fresh perspective: that of distributions. Leveraging tools from optimal transport, particularly the Gromov-Wasserstein distance, we unify clustering and dimensionality reduction into a single framework called distributional reduction. This allows us to jointly address clustering and dimensionality reduction with a single optimization problem. Through comprehensive experiments, we highlight the versatility and interpretability of our method and show that it outperforms existing approaches across a variety of image and genomics datasets.
- North America > United States > New York > New York County > New York City (0.14)
- Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)
- Europe > Switzerland > Vaud > Lausanne (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur (0.04)
Adaptive Machine Unlearning
Gupta, Varun, Jung, Christopher, Neel, Seth, Roth, Aaron, Sharifi-Malvajerdi, Saeed, Waites, Chris
Data deletion algorithms aim to remove the influence of deleted data points from trained models at a cheaper computational cost than fully retraining those models. However, for sequences of deletions, most prior work in the non-convex setting gives valid guarantees only for sequences that are chosen independently of the models that are published. If people choose to delete their data as a function of the published models (because they don't like what the models reveal about them, for example), then the update sequence is adaptive. In this paper, we give a general reduction from deletion guarantees against adaptive sequences to deletion guarantees against non-adaptive sequences, using differential privacy and its connection to max information. Combined with ideas from prior work which give guarantees for non-adaptive deletion sequences, this leads to extremely flexible algorithms able to handle arbitrary model classes and training methodologies, giving strong provable deletion guarantees for adaptive deletion sequences. We show in theory how prior work for non-convex models fails against adaptive deletion sequences, and use this intuition to design a practical attack against the SISA algorithm of Bourtoule et al. [2021] on CIFAR-10, MNIST, Fashion-MNIST.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Colorado > Denver County > Denver (0.04)
Verifiable RNN-Based Policies for POMDPs Under Temporal Logic Constraints
Carr, Steven, Jansen, Nils, Topcu, Ufuk
Recurrent neural networks (RNNs) have emerged as an effective representation of control policies in sequential decision-making problems. However, a major drawback in the application of RNN-based policies is the difficulty in providing formal guarantees on the satisfaction of behavioral specifications, e.g. safety and/or reachability. By integrating techniques from formal methods and machine learning, we propose an approach to automatically extract a finite-state controller (FSC) from an RNN, which, when composed with a finite-state system model, is amenable to existing formal verification tools. Specifically, we introduce an iterative modification to the so-called quantized bottleneck insertion technique to create an FSC as a randomized policy with memory. For the cases in which the resulting FSC fails to satisfy the specification, verification generates diagnostic information. We utilize this information to either adjust the amount of memory in the extracted FSC or perform focused retraining of the RNN. While generally applicable, we detail the resulting iterative procedure in the context of policy synthesis for partially observable Markov decision processes (POMDPs), which is known to be notoriously hard. The numerical experiments show that the proposed approach outperforms traditional POMDP synthesis methods by 3 orders of magnitude within 2% of optimal benchmark values.
- North America > United States > Texas > Travis County > Austin (0.04)
- Europe > Netherlands > Gelderland > Nijmegen (0.04)
Recognizing Plans by Learning Embeddings from Observed Action Distributions
Zha, Yantian, Li, Yikang, Gopalakrishnan, Sriram, Li, Baoxin, Kambhampati, Subbarao
Recent advances in visual activity recognition have raised the possibility of applications such as automated video surveillance. Effective approaches for such problems however require the ability to recognize the plans of agents from video information. Although traditional plan recognition algorithms depend on access to sophisticated planning domain models, one recent promising direction involves learning approximated (or shallow) domain models directly from the observed activity sequences DUP. One limitation is that such approaches expect observed action sequences as inputs. In many cases involving vision/sensing from raw data, there is considerable uncertainty about the specific action at any given time point. The most we can expect in such cases is probabilistic information about the action at that point. The input will then be sequences of such observed action distributions. In this work, we address the problem of constructing an effective data-interface that allows a plan recognition module to directly handle such observation distributions. Such an interface works like a bridge between the low-level perception module, and the high-level plan recognition module. We propose two approaches. The first involves resampling the distribution sequences to single action sequences, from which we could learn an action affinity model based on learned action (word) embeddings for plan recognition. The second is to directly learn action distribution embeddings by our proposed Distr2vec (distribution to vector) model, to construct an affinity model for plan recognition.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > Los Angeles County > Pasadena (0.04)
- North America > United States > Arizona > Maricopa County > Tempe (0.04)
- Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.04)