Goto

Collaborating Authors

 Genre


Accurate KVCache Eviction via Anchor Direction Projection for Efficient LLMInference

Neural Information Processing Systems

Key-Value (KV) cache eviction--which retains the KV pairs of the most important tokens while discarding less important ones--is a critical technique for optimizing both memory usage and inference latency in large language models (LLMs). However, existing approaches often rely on simple heuristics--such as attention weights--to measure token importance, overlooking the spatial relationships between token value states in the vector space. This often leads to suboptimal token selections and thus performance degradation. To tackle this problem, we propose a novel method, namely AnDPro (Anchor Direction Projection), which introduces a projection-based scoring function to more accurately measure token importance. Specifically, AnDPro operates in the space of value vectors and leverages the projections of these vectors onto an "Anchor Direction"--the direction of the pre-eviction output--to measure token importance and guide more accurate token selection. Experiments on 16datasets from the LongBench benchmark demonstrate that AnDPro can maintain 96.07%of the full cache accuracy using only 3.44%KV cache budget, reducing KV cache budget size by 46.0% without compromising quality compared to previous state-of-the-arts.


Efficient k-Sparse Band-Limited Interpolation with Improved Approximation Ratio

Neural Information Processing Systems

We consider the task of interpolating a k-sparse band-limited signal from a small collection of noisy time-domain samples. Exploiting a new analytic framework for hierarchical frequency decomposition that performs systematic noise cancellation, we give the first polynomial-time algorithm with a provable (3+ 2+ฮต)approximation guarantee for continuous interpolation. Our method breaks the long-standing C > 100 barrier set by the best previous algorithms, sharply reducing the gap to optimal recovery and establishing a new state of the art for high-accuracy band-limited interpolation. We also give a refined "shrinking-range" variant that achieves a ( 2+ฮต+c)-approximation on any sub-interval (1 c)T for some c (0,1), which gives even higher interpolation accuracy.


Future-Aware End-to-End Driving: Bidirectional Modeling of Trajectory Planning and Scene Evolution

Neural Information Processing Systems

End-to-end autonomous driving methods aim to directly map raw sensor inputs to future driving actions such as planned trajectories, bypassing traditional modular pipelines. While these approaches have shown promise, they often operate under a one-shot paradigm that relies heavily on the current scene context, potentially underestimating the importance of scene dynamics and their temporal evolution. This limitation restricts the model's ability to make informed and adaptive decisions in complex driving scenarios. We propose a new perspective: the future trajectory of an autonomous vehicle is closely intertwined with the evolving dynamics of its environment, and conversely, the vehicle's own future states can influence how the surrounding scene unfolds. Motivated by this bidirectional relationship, we introduce SeerDrive, a novel end-to-end framework that jointly models future scene evolution and trajectory planning in a closed-loop manner. Our method first predicts future bird's-eye view (BEV) representations to anticipate the dynamics of the surrounding scene, then leverages this foresight to generate future-context-aware trajectories. Two key components enable this: (1) future-aware planning, which injects predicted BEV features into the trajectory planner, and (2) iterative scene modeling and vehicle planning, which refines both future scene prediction and trajectory generation through collaborative optimization. Extensive experiments on the NAVSIM and nuScenes benchmarks show that SeerDrive significantly outperforms existing state-of-the-art methods.


Multiscale guidance of protein structure prediction with heterogeneous cryo-EM data

Neural Information Processing Systems

Protein structure prediction models are now capable of generating accurate 3D structural hypotheses from sequence alone. However, they routinely fail to capture the conformational diversity of dynamic biomolecular complexes, often requiring heuristic MSA subsampling approaches for generating alternative states. In parallel, cryo-electron microscopy (cryo-EM) has emerged as a powerful tool for imaging near-native structural heterogeneity, but is challenged by arduous pipelines to transform raw experimental data into atomic models. Here, we bridge the gap between these modalities, combining cryo-EM density maps with the rich sequence and biophysical priors learned by protein structure prediction models. Our method, CryoBoltz, guides the sampling trajectory of a pretrained biomolecular structure prediction model using both global and local structural constraints derived from density maps, driving predictions towards conformational states consistent with the experimental data. We demonstrate that this flexible yet powerful inferencetime approach allows us to build atomic models into heterogeneous cryo-EM maps across a variety of dynamic biomolecular systems including transporters and antibodies.


Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving on Inequalities

Neural Information Processing Systems

LLM-based formal proof assistants (e.g., in Lean) hold great promise for automating mathematical discovery. But beyond syntactic correctness, do these systems truly understand mathematical structure as humans do? We investigate this question in context of mathematical inequalities--specifically the prover's ability to recognize that the given problem simplifies by applying a known inequality such as AM/GM. Specifically, we are interested in their ability to do this in a compositional setting where multiple inequalities must be applied as part of a solution. We introduce Ineq-Comp, a benchmark built from elementary inequalities through systematic transformations, including variable duplication, algebraic rewriting, and multi-step composition.


Geometric Algebra-Enhanced Bayesian Flow Network for RNAInverse Design

Neural Information Processing Systems

With the development of biotechnology, RNA therapies have shown great potential. However, different from proteins, the sequences corresponding to a single RNA three-dimensional structure are more abundant. Most of the existing RNA design methods merely take into account the secondary structure of RNA, or are only capable of generating a limited number of candidate sequences. To address these limitations, we propose a geometric-algebra-enhanced Bayesian Flow Network for the inverse design of RNA, called RBFN. RBFN uses a Bayesian Flow Network to model the distribution of nucleotide sequences in RNA, enabling the generation of more reasonable RNA sequences. Meanwhile, considering the more flexible characteristics of RNA conformations, we utilize geometric algebra to enhance the modeling ability of the RNA three-dimensional structure, facilitating a better understanding of RNA structural properties. In addition, due to the scarcity of RNA structures and the limitation that there are only four types of nucleic acids, we propose a new time-step distribution sampling to address the scarcity of RNA structure data and the relatively small number of nucleic acid types. Evaluation on the single-state fixed-backbone re-design benchmark and multi-state fixedbackbone benchmark indicates that RBFN can outperform existing RNA design methods in various RNA design tasks, enabling effective RNA sequence design.


Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL

Neural Information Processing Systems

Large language models (LLMs) excel in tasks like question answering and dialogue, but complex tasks requiring interaction, such as negotiation and persuasion, require additional long-horizon reasoning and planning. Reinforcement learning (RL) fine-tuning can enable such planning in principle, but suffers from drawbacks that hinder scalability. In particular, multi-turn RL training incurs high memory and computational costs, which are exacerbated when training LLMs as policies. Furthermore, the largest LLMs do not expose the APIs necessary to be trained in such manner. As a result, modern methods to improve the reasoning of LLMs rely on sophisticated prompting mechanisms rather than RL fine-tuning. To remedy this, we propose a novel approach that uses goal-conditioned value functions to guide the reasoning of LLM agents, that scales even to large API-based models. These value functions predict how a task will unfold given an action, allowing the LLM agent to evaluate multiple possible outcomes, both positive and negative, to plan effectively. In addition, these value functions are trained over reasoning steps rather than full actions, to be a concise and light-weight module that facilitates decisionmaking in multi-turn interactions.


Towards Accurate Time Series Forecasting via Implicit Decoding

Neural Information Processing Systems

Recent booming time series models have demonstrated remarkable forecasting performance. However, these methods often place greater focus on more effectively modelling the historical series, largely neglecting the forecasting phase, which generates long-term forecasts by separately predicting multiple time points. Given that real-world time series typically consist of various long short-term dynamics, independent predictions over individual time points may fail to express complex underlying patterns and can lead to a lack of global views. To address these issues, this work explores new perspectives from the forecasting phase and proposes a novel Implicit Forecaster (IF) as an additional decoding module. Inspired by decomposition forecasting, IF adopts a more nuanced approach by implicitly predicting constituent waves represented by their frequency, amplitude, and phase, thereby accurately forming the time series. Extensive experimental results from multiple real-world datasets show that IF can consistently boost mainstream time series models, achieving state-of-the-art forecasting performance.


TROVE: Discovering Error-Inducing Static Feature Biases in Temporal Vision-Language Models

Neural Information Processing Systems

Vision-language models (VLMs) have made great strides in addressing temporal understanding tasks, which involve characterizing visual changes across a sequence of images. However, recent works have suggested that when making predictions, VLMs may rely on static feature biases, such as background or object features, rather than dynamic visual changes. Static feature biases are a type of shortcut and can contribute to systematic prediction errors on downstream tasks; as a result, identifying and characterizing error-inducing static feature biases is critical prior to real-world model deployment. Existing approaches for identifying such systematic failure modes in trained models (i) are typically designed for nontemporal settings and (ii) are challenging to evaluate in temporal settings due to the lack of quantitative evaluation frameworks. In this work, we address these challenges by introducing TROVE, an automated approach for discovering errorinducing static feature biases learned by temporal VLMs. Given a trained VLM and an annotated validation dataset associated with a downstream classification task, TROVE extracts candidate static features from the dataset and scores each feature by (i) the effect of the feature on classification errors as well as (ii) the extent to which the VLM relies on the feature when making predictions. In order to quantitatively evaluate TROVE, we introduce an evaluation framework consisting of 101 trained temporal VLMs paired with ground-truth annotations for learned static feature biases. We use this framework to demonstrate that TROVE can accurately identify error-inducing static feature biases in VLMs, achieving a 28.6% improvement over the closest baseline. Finally, we apply TROVE to 7 off-the-shelf VLMs and 2 temporal understanding tasks, surfacing previouslyunknown static feature biases and demonstrating that knowledge of learned biases can aid in improving model performance at test time.


0e4b12a79106789483fe6746702f4cb0-Paper-Conference.pdf

Neural Information Processing Systems

As large language models (LLMs) continue to advance, their capacity to function effectively across a diverse range of languages has shown marked improvement. Preliminary studies observe that the hidden activations of LLMs often resemble English, even when responding to non-English prompts. This has led to the widespread assumption that LLMs may "think" in English.