Goto

Collaborating Authors

 Performance Analysis


Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach

Neural Information Processing Systems

Watermarking has emerged as a crucial method to distinguish AI-generated text from human-created text. Current watermarking approaches often lack formal optimality guarantees or address the scheme and detector design separately. In this paper, we introduce a novel, unified theoretical framework for watermarking Large Language Models (LLMs) that jointly optimizes both the watermarking scheme and detector. Our approach aims to maximize detection performance while maintaining control over the worst-case false positive rate (FPR) and distortion on text quality. We derive closed-form optimal solutions for this joint design and characterize the fundamental trade-off between watermark detectability and distortion. Notably, we reveal that the optimal watermarking schemes should be adaptive to the LLM's generative distribution. Building on our theoretical insights, we propose a distortion-free, distribution-adaptive watermarking algorithm (DAWA) that leverages a surrogate model for model-agnosticism and efficiency. Experiments on Llama2-13B and Mistral-8$\times$7B models confirm the effectiveness of our approach, particularly at ultra-low FPRs. Our code is available at \url{https://github.com/yepengliu/DAWA}.


BlockScan: Detecting Anomalies in Blockchain Transactions

Neural Information Processing Systems

We propose BlockScan, a customized Transformer for anomaly detection in blockchain transactions. Unlike existing methods that rely on rule-based systems or directly apply off-the-shelf large language models (LLMs), BlockScan introduces a series of customized designs to effectively model the unique data structure of blockchain transactions. First, a blockchain transaction is multi-modal, containing blockchain-specific tokens, texts, and numbers. We design a novel modularized tokenizer to handle these multi-modal inputs, balancing the information across different modalities. Second, we design a customized masked language modeling mechanism for pretraining the Transformer architecture, incorporating RoPE embedding and FlashAttention for handling longer sequences. Finally, we design a novel anomaly detection method based on the model outputs.


RayFusion: Ray Fusion Enhanced Collaborative Visual Perception

Neural Information Processing Systems

Collaborative visual perception methods have gained widespread attention in the autonomous driving community in recent years due to their ability to address sensor limitation problems. However, the absence of explicit depth information often makes it difficult for camera-based perception systems, e.g., 3D object detection, to generate accurate predictions. To alleviate the ambiguity in depth estimation, we propose RayFusion, a ray-based fusion method for collaborative visual perception. Using ray occupancy information from collaborators, RayFusion reduces redundancy and false positive predictions along camera rays, enhancing the detection performance of purely camera-based collaborative perception systems. Comprehensive experiments show that our method consistently outperforms existing state-of-the-art models, substantially advancing the performance of collaborative visual perception. Our code will be made publicly available.


AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees

Neural Information Processing Systems

We study the problem of determining whether a piece of text has been authored by a human or by a large language model (LLM). Existing state of the art logits-based detectors make use of statistics derived from the log-probability of the observed text evaluated using the distribution function of a given source LLM. However, relying solely on log probabilities can be sub-optimal. In response, we introduce AdaDetectGPT -- a novel classifier that adaptively learns a witness function from training data to enhance the performance of logits-based detectors. We provide statistical guarantees on its true positive rate, false positive rate, true negative rate and false negative rate. Extensive numerical studies show AdaDetectGPT nearly uniformly improves the state-of-the-art method in various combination of datasets and LLMs, and the improvement can reach up to 37\%.


FedRACE: A Hierarchical and Statistical Framework for Robust Federated Learning

Neural Information Processing Systems

Integrating large pre-trained models into federated learning (FL) can significantly improve generalization and convergence efficiency. A widely adopted strategy freezes the pre-trained backbone and fine-tunes a lightweight task head, thereby reducing computational and communication costs. However, this partial fine-tuning paradigm introduces new security risks, making the system vulnerable to poisoned updates and backdoor attacks. To address these challenges, we propose FedRACE, a unified framework for robust FL with partially frozen models. FedRACE comprises two core components: HStat-Net, a hierarchical network that refines frozen features into compact, linearly separable representations; and DevGuard, a server-side mechanism that detects malicious clients by evaluating statistical deviance in class-level predictions modeling generalized linear models (GLMs). DevGuard further incorporates adaptive thresholding based on theoretical misclassification bounds and employs randomized majority voting to enhance detection reliability. We implement FEDRACE on the FedScale platform and evaluate it on CIFAR-100, Food-101, and Tiny ImageNet under diverse attack scenarios. FedRACE achieves a true positive rate of up to 99.3% with a false positive rate below 1.2%, while preserving model accuracy and improving generalization.


One Head to Rule Them All: Amplifying LVLM Safety through a Single Critical Attention Head

Neural Information Processing Systems

Large Vision-Language Models (LVLMs) have demonstrated impressive capabilities in tasks requiring multimodal understanding. However, recent studies indicate that LVLMs are more vulnerable than LLMs to unsafe inputs and prone to generating harmful content. Existing defense strategies primarily include fine-tuning, input sanitization, and output intervention. Although these approaches provide a certain level of protection, they tend to be resource-intensive and struggle to effectively counter sophisticated attack techniques. To tackle such issues, we propose One-head Defense (Oh Defense), a novel yet simple approach utilizing LVLMs' internal safety capabilities. Through systematic analysis of the attention mechanisms, we discover that LVLMs' safety capabilities are concentrated within specific attention heads that respond differently to safe or unsafe inputs. Further exploration reveals that a single critical attention head can effectively serve as a safety guard, providing a strong discriminative signal that amplifies the model's inherent safety capabilities. Hence, the Oh Defense requires no additional training or external modules, making it computationally efficient while effectively reactivating suppressed safety mechanisms. Extensive experiments across diverse LVLM architectures and unsafe datasets validate our approach, i.e., the Oh Defense achieves near-perfect defense success rates (> 98\%) for unsafe inputs while maintaining low false positive rates (< 5\%) for safe content.


On the SAC-BL Algorithm for Anomaly Detection

Neural Information Processing Systems

Visual anomaly detection is significant in safety-critical and reliability-sensitive scenarios. Prior studies mainly emphasize the design and training of scoring functions, while little effort has been devoted to constructing decision rules based on these score functions. A recent work Ma et al. (2025b) highlights this issue and proposes the SAC-BL algorithm to address it. This method consists of a strong anomaly constraint (SAC) network and a betting-like (BL) algorithm serving as the decision rule. The SAC-BL algorithm can control the false discovery rate (FDR).


Reliably detecting model failures in deployment without labels

Neural Information Processing Systems

The distribution of data changes over time; models operating in dynamic environments need retraining. But knowing when to retrain, without access to labels, is an open challenge since some, but not all shifts degrade model performance. This paper formalizes and addresses the problem of post-deployment deterioration (PDD) monitoring. We propose D3M, a practical and efficient monitoring algorithm based on the disagreement of predictive models, achieving low false positive rates under non-deteriorating shifts and provides sample complexity bounds for high true positive rates under deteriorating shifts. Empirical results on both standard benchmark and a real-world large-scale internal medicine dataset demonstrate the effectiveness of the framework and highlight its viability as an alert mechanism for high-stakes machine learning pipelines.


America Has a Pangram Problem

The Atlantic - Technology

AI-detection tools are getting better. Basically every recent, high-profile accusation of someone passing off AI-generated writing as their own has started in the same way: with a tool called Pangram. In March, when a horror novel from a major publishing house was pulled just days before its scheduled U.S. release date, it was in part because Pangram, an AI-detection program, had identified the text as AI-generated. Other people have fed text into Pangram to suggest that chatbots have been used to write articles in major newspapers including, multiple short stories awarded a prestigious literary prize, and most recently, significant chunks of Pope Leo XIV's encyclical warning about the dangers of AI. The tool is also used by universities to vet student work and scientific associations to scan research papers.


The Good, the Bad, and the Ugly of Markov Boundary for Tabular Prediction

arXiv.org Machine Learning

Under standard graphical assumptions, the Markov boundary of a target variable is the smallest set of features that renders every other feature redundant. Once the boundary is observed, the target is conditionally independent of the rest of the table. This is a tempting object for tabular prediction, since it names exactly the columns a model should need. Yet modern regressors are still trained on the full feature set. We ask whether the Markov boundary is genuinely useful for prediction on SCM3K, a 3,450-task synthetic SCM benchmark with feature counts from 40 to 1000 and six SCM families, evaluated with six regressors. The answer is more nuanced than the theory suggests. Restricting a regressor to the oracle boundary often improves prediction substantially, and the improvement grows as the feature space becomes larger and sparser. But the natural pipeline of recovering the boundary with causal discovery and training on the recovered mask does not deliver. Existing estimators exhaust the compute budget before reaching the regime where the boundary helps most, and even where they run they rarely beat the full feature set. We trace this to three causes. Discovery optimizes structural recovery rather than prediction. False negatives and false positives carry sharply asymmetric predictive cost. The exact boundary is only one of many feature sets that beat all features. We then develop what these facts imply for prediction-aligned feature selection and for tabular models that learn to use causal structure.