Goto

Collaborating Authors

 Genre


GeoAda: Efficiently Finetune Geometric Diffusion Models with Equivariant Adapters

Neural Information Processing Systems

Geometric diffusion models have shown remarkable success in molecular dynamics and structure generation. However, efficiently fine-tuning them for downstream tasks with varying geometric controls remains underexplored. In this work, we propose an SE(3)-equivariant adapter framework (GeoAda) that enables flexible and parameter-efficient fine-tuning for controlled generative tasks without modifying the original model architecture. GeoAda introduces a structured adapter design: control signals are first encoded through coupling operators, then processed by a trainable copy of selected base model layers, and finally projected back via decoupling operators followed by an equivariant zero-initialized convolution. By fine-tuning only these lightweight adapter modules, GeoAda preserves the model's geometric consistency while mitigating overfitting and catastrophic forgetting. We theoretically prove that the proposed adapters maintain SE(3)-equivariance, ensuring that the geometric inductive biases of the pretrained diffusion model remain intact during adaptation. We demonstrate the wide applicability of \method across diverse geometric control types, including frame control, global control, subgraph control, and a broad range of application domains such as particle dynamics, molecular dynamics, human motion prediction, and molecule generation. Empirical results show that GeoAda achieves state-of-the-art fine-tuning performance while preserving original task accuracy, whereas other baselines experience significant performance degradation due to overfitting and catastrophic forgetting.


LLM Query Scheduling with Prefix Reuse and Latency Constraints

Neural Information Processing Systems

The efficient deployment of large language models (LLMs) in online settings requires optimizing inference performance under stringent latency constraints, particularly the time-to-first-token (TTFT) and time-per-output-token (TPOT). This paper focuses on the query scheduling problem for LLM inference with prefix reuse, a technique that leverages shared prefixes across queries to reduce computational overhead. Our work reveals previously unknown limitations of the existing first-come-first-serve (FCFS) and longest-prefix-match (LPM) scheduling strategies with respect to satisfying latency constraints. We present a formal theoretical framework for LLM query scheduling under RadixAttention, a prefix reuse mechanism that stores and reuses intermediate representations in a radix tree structure. Our analysis establishes the NP-hardness of the scheduling problem with prefix reuse under TTFT constraints and proposes a novel scheduling algorithm, $k$-LPM, which generalizes existing methods by balancing prefix reuse and fairness in query processing. Theoretical guarantees demonstrate that $k$-LPM achieves improved TTFT performance under realistic traffic patterns captured by a data generative model. Empirical evaluations in a realistic serving setting validates our findings, showing significant reductions in P99 TTFT compared to baseline methods.


Interpreting Emergent Features in Deep Learning-based Side-channel Analysis

Neural Information Processing Systems

Side-channel analysis (SCA) poses a real-world threat by exploiting unintentional physical signals to extract secret information from secure devices. Evaluation labs also use the same techniques to certify device security. In recent years, deep learning has emerged as a prominent method for SCA, achieving state-of-the-art attack performance at the cost of interpretability. Understanding how neural networks extract secrets is crucial for security evaluators aiming to defend against such attacks, as only by understanding the attack can one propose better countermeasures. In this work, we apply mechanistic interpretability to neural networks trained for SCA, revealing $\textit{how}$ models exploit $\textit{what}$ leakage in side-channel traces. We focus on sudden jumps in performance to reverse engineer learned representations, ultimately recovering secret masks and moving the evaluation process from black-box to white-box. Our results show that mechanistic interpretability can scale to realistic SCA settings, even when relevant inputs are sparse, model accuracies are low, and side-channel protections prevent standard input interventions.


Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm

Neural Information Processing Systems

This paper investigates infinite-horizon average reward Constrained Markov Decision Processes (CMDPs) under general parametrized policies with smooth and bounded policy gradients. We propose a Primal-Dual Natural Actor-Critic algorithm that adeptly manages constraints while ensuring a high convergence rate. In particular, our algorithm achieves global convergence and constraint violation rates of $\tilde{\mathcal{O}}(1/\sqrt{T})$ over a horizon of length $T$ when the mixing time, $\tau_{\mathrm{mix}}$, is known to the learner.


Anthropic Says It's Taking Claude Fable 5 Offline to Comply With US Government Order

WIRED

Anthropic Says It's Taking Claude Fable 5 Offline to Comply With US Government Order "The government believes it has become aware of a method of bypassing, or'jailbreaking' Fable 5," the company said in a blog post. Anthropic says it's disabling two AI models it launched earlier this week, Claude Fable 5 and Mythos 5, to comply with an export control directive it received Friday afternoon from the US government citing national security concerns. The unprecedented incident marks the latest flashpoint between Anthropic and the Trump administration . While the company says the order asked it to suspend access to "any foreign national, whether inside or outside the United States, including foreign national Anthropic employees," it has removed access for all of its customers to ensure compliance. Earlier this year, Trump's Department of Defense labeled Anthropic a " supply chain risk " after the Claude-maker sought to draw red lines over how the US military could use its technology.


To Distill or Decide? Understanding the Algorithmic Trade-off in Partially Observable RL

Neural Information Processing Systems

Partial observability is a notorious challenge in reinforcement learning (RL), due to the need to learn complex, history-dependent policies. Recent empirical successes have used -- which leverages availability of latent state information during training (e.g., from a simulator) to learn and imitate the optimal latent, Markovian policy -- to disentangle the task of ''learning to see'' from ''learning to act''. While expert distillation is more computationally efficient than RL without latent state information, it also has well-documented failure modes. In this paper -- through a simple but instructive theoretical model called the, and controlled experiments on challenging simulated locomotion tasks -- we investigate the algorithmic trade-off between privileged expert distillation and standard RL without privileged information.


BecomingLit: Relightable Gaussian Avatars with Hybrid Neural Shading

Neural Information Processing Systems

We introduce, a novel method for reconstructing relightable, high-resolution head avatars that can be rendered from novel viewpoints at interactive rates. Therefore, we propose a new low-cost light stage capture setup, tailored specifically towards capturing faces. Using this setup, we collect a novel dataset consisting of diverse multi-view sequences of numerous subjects under varying illumination conditions and facial expressions. By leveraging our new dataset, we introduce a new relightable avatar representation based on 3D Gaussian primitives that we animate with a parametric head model and an expression-dependent dynamics module. We propose a new hybrid neural shading approach, combining a neural diffuse BRDF with an analytical specular term. Our method reconstructs disentangled materials from our dynamic light stage recordings and enables all-frequency relighting of our avatars with both point lights and environment maps. In addition, our avatars can easily be animated and controlled from monocular videos. We validate our approach in extensive experiments on our dataset, where we consistently outperform existing state-of-the-art methods in relighting and reenactment by a significant margin.


Topology-Aware Conformal Prediction for Stream Networks

Neural Information Processing Systems

Existing approaches either neglect dependencies, leading to overly conservative predictions, or rely solely on data-driven estimations, failing to capture the rich topological structure of the network. To address these challenges, we propose Spatio-Temporal Adaptive Conformal Inference (STACI), a novel framework that integrates network topology and temporal dynamics into the conformal prediction framework. STACI introduces a topology-aware nonconformity score that respects directional flow constraints and dynamically adjusts prediction sets to account for temporal distributional shifts. We provide theoretical guarantees on the validity of our approach and demonstrate its superior performance on both synthetic and real-world datasets. Our results show that STACI effectively balances prediction efficiency and coverage, outperforming existing conformal prediction methods for stream networks.


FlowMixer: A Depth-Agnostic Neural Architecture for Interpretable Spatiotemporal Forecasting

Neural Information Processing Systems

We introduce FlowMixer, a single-layer neural architecture that leverages constrained matrix operations to model structured spatiotemporal patterns with enhanced interpretability. FlowMixer incorporates non-negative matrix mixing layers within a reversible mapping framework--applying transforms before mixing and their inverses afterward. This shape-preserving design enables a Kronecker-Koopman eigenmodes framework that bridges statistical learning with dynamical systems theory, providing interpretable spatiotemporal patterns and facilitating direct algebraic manipulation of prediction horizons without retraining. The architecture's semi-group property enables this single layer to mathematically represent any depth through composition, eliminating depth search entirely. Extensive experiments across diverse domains demonstrate FlowMixer's long-horizon forecasting capabilities while effectively modeling physical phenomena such as chaotic attractors and turbulent flows. Our results achieve performance matching state-of-the-art methods while offering superior interpretability through directly extractable eigenmodes. This work suggests that architectural constraints can simultaneously maintain competitive performance and enhance mathematical interpretability in neural forecasting systems.


AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders

Neural Information Processing Systems

Speculative Decoding (SD) accelerates large language model inference by employing a small draft model to generate predictions, which are then verified by a larger target model. The effectiveness of SD hinges on the alignment between these models, which is typically enhanced by Knowledge Distillation (KD). However, conventional KD methods aim to minimize the KL divergence between the draft and target models across all tokens, a goal that is misaligned with the true objective of SD, which is to maximize token acceptance rate. Therefore, draft models often struggle to fully assimilate the target model's knowledge due to capacity constraints, leading to suboptimal performance. To address this challenge, we propose AdaSPEC, a novel method that incorporates selective token filtering into the KD process. AdaSPEC utilizes a reference model to identify and filter out difficult-to-fit tokens, enabling the distillation of a draft model that better aligns with the target model on simpler tokens. This approach improves the overall token acceptance rate without compromising generation quality. We evaluate AdaSPEC across diverse tasks, including arithmetic reasoning, instruction-following, coding, and summarization, using model configurations of 31M/1.4B