Goto

Collaborating Authors

 Industry


PRESTO: Preimage-Informed Instruction Optimization for Prompting Black-Box LLMs

Neural Information Processing Systems

Large language models (LLMs) have achieved remarkable success across diverse domains, due to their strong instruction-following capabilities. This has led to increasing interest in optimizing instructions for black-box LLMs, whose internal parameters are inaccessible but widely used due to their strong performance. To optimize instructions for black-box LLMs, recent methods employ white-box LLMs to generate candidate instructions from optimized soft prompts. However, white-box LLMs often map different soft prompts to the same instruction, leading to redundant queries. While previous studies regarded this many-to-one mapping as a structure that hinders optimization efficiency, we reinterpret it as a useful prior knowledge that can accelerate the optimization.


TIME: AMulti-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios

Neural Information Processing Systems

Temporal reasoning is pivotal for Large Language Models (LLMs) to comprehend the real world. However, existing works neglect the real-world challenges for temporal reasoning: (1) intensive temporal information, (2) fast-changing event dynamics, and (3) complex temporal dependencies in social interactions. To bridge this gap, we propose a multi-level benchmark TIME, designed for temporal reasoning in real-world scenarios. TIME consists of 38,522 QA pairs, covering 3 levels with 11 fine-grained sub-tasks. This benchmark encompasses 3 sub-datasets reflecting different real-world challenges: TIME-WIKI, TIME-NEWS, and TIMEDIAL. We conduct extensive experiments on reasoning models and non-reasoning models. And we conducted an in-depth analysis of temporal reasoning performance across diverse real-world scenarios and tasks, and summarized the impact of test-time scaling on temporal reasoning capabilities. Additionally, we release TIME-LITE, a human-annotated subset to foster future research and standardized evaluation in temporal reasoning.


UniGTE: Unified Graph-Text Encoding for Zero-Shot Generalization across Graph Tasks and Domains

Neural Information Processing Systems

Generalizing to unseen graph tasks without task-specific supervision is challenging: conventional graph neural networks are typically tied to a fixed label space, while large language models (LLMs) struggle to capture graph structure. We introduce UniGTE, an instruction-tuned encoder-decoder framework that unifies structural and semantic reasoning. The encoder augments a pretrained autoregressive LLM with learnable alignment tokens and a structure-aware graph-text attention mechanism, enabling it to attend jointly to a tokenized graph and a natural-language task prompt while remaining permutation-invariant to node order.



Protocols for Verifying Smooth Strategies in Bandits and Games

Neural Information Processing Systems

We study protocols for verifying approximate optimality of strategies in multiarmed bandits and normal-form games. As the number of actions available to each player is often large, we seek protocols where the number of queries to the utility oracle is sublinear in the number of actions. We prove that such verification is possible for sufficiently smooth strategies that do not put too much probability mass on any specific action and provide protocols for verifying that a smooth policy for a multi-armed bandit is close to optimal. Our verification protocols require provably fewer arm queries than learning. Furthermore, we show how to use cryptographic tools to reduce the communication cost of our protocols. We complement our protocol by proving a nearly tight lower bound on the query complexity of verification in our settings. As an application, we use our bandit verification protocol to build a protocol for verifying approximate optimality of a strong smooth Nash equilibrium, with sublinear query complexity.


Military AINeeds Technically-Informed Regulation to Safeguard AIResearch and its Applications

Neural Information Processing Systems

Military weapon systems and command-and-control infrastructure augmented by artificial intelligence (AI) have seen rapid development and deployment in recent years. However, the sociotechnical impacts of AI on combat systems, military decision-making, and the norms of warfare have been understudied. We focus on a specific subset of lethal autonomous weapon systems (LAWS) that use AI for targeting or battlefield decisions. We refer to this subset as AI-powered lethal autonomous weapon systems (AI-LAWS) and argue that they introduce novel risks--including unanticipated escalation, poor reliability in unfamiliar environments, and erosion of human oversight--all of which threaten both military effectiveness and the openness of AI research. These risks cannot be addressed by high-level policy alone; effective regulation must be grounded in the technical behavior of AI models. We argue that AI researchers must be involved throughout the regulatory lifecycle. Thus, we propose a clear, behavior-based definition of AILAWS--systems that introduce unique risks through their use of modern AI--as a foundation for technically grounded regulation, given that existing frameworks do not distinguish them from conventional LAWS. Using this definition, we propose several technically-informed policy directions and invite greater participation from the AI research community in military AI policy discussions.


Incentive-Aware Dynamic Resource Allocation under Long-Term Cost Constraints

Neural Information Processing Systems

Motivated by applications such as cloud platforms allocating GPUs to users or governments deploying mobile health units across competing regions, we study the constrained dynamic allocation of a reusable resource to a group of strategic agents. Our objective is to simultaneously (i) maximize social welfare, (ii) satisfy multidimensional long-term cost constraints, and (iii) incentivize truthful reporting. We begin by numerically evaluating primal-dual methods widely used in constrained online optimization and find them to be highly fragile in strategic settings - agents can easily manipulate their reports to distort future dual updates for future gain. To address this vulnerability, we develop an incentive-aware framework that makes primal-dual methods robust to strategic behavior. Our primal-side design combines epoch-based lazy updates - discouraging agents from distorting dual updates - with dual-adjust pricing and randomized exploration techniques that extract approximately truthful signals for learning. On the dual side, we design a novel online learning subroutine to resolve a circular dependency between actions and predictions; this makes our mechanism achieve eO( T)social welfare regret (where T is the number of allocation rounds), satisfies all cost constraints, and ensures incentive alignment. This eO( T) performance matches that of non-strategic allocation approaches while additionally exhibiting robustness to strategic agents.



Bootstrap Your Uncertainty: Adaptive Robust Classification Driven by Optimal-Transport

Neural Information Processing Systems

Distributionally Robust Optimization (DRO) offers a promising framework by optimizing worst-case performance over a set of candidate distributions, referred to as the uncertainty set. However, the efficacy of DRO heavily depends on the design of the uncertainty set, and existing methods often perform suboptimally due to an inappropriate or inflexible uncertainty set. In this work, we first propose a novel perspective that casts entropy-regularized Wasserstein DRO as a dynamic process of distributional exploration and semantic alignment, both driven by optimal transport (OT). This unified viewpoint yields two key new techniques: semantic calibration, which bootstraps semantically meaningful transport costs via inverse OT, and adaptive refinement, which adjusts uncertainty set using OT-driven feedback. Together, these components form an exploration-and-feedback system, where the transport costs and uncertainty set evolve jointly during training, enabling the model to better adapt to potential distribution shifts. Moreover, we provide an in-depth analysis of this adaptive process and prove theoretical guarantees of convergence. Finally, we present our experimental results across diverse distribution shift scenarios, which demonstrate that our approach significantly outperforms existing methods, achieving state-ofthe-art robustness.


ESCORT: Efficient Stein-variational and Sliced Consistency-Optimized Temporal Belief Representation for POMDPs

Neural Information Processing Systems

In Partially Observable Markov Decision Processes (POMDPs), maintaining and updating belief distributions over possible underlying states provides a principled way to summarize action-observation history for effective decision-making under uncertainty. As environments grow more realistic, belief distributions develop complexity that standard mathematical models cannot accurately capture, creating a fundamental challenge in maintaining representational accuracy. Despite advances in deep learning and probabilistic modeling, existing POMDP belief approximation methods fail to accurately represent complex uncertainty structures such as high-dimensional, multi-modal belief distributions, resulting in estimation errors that lead to suboptimal agent behaviors. To address this challenge, we present ESCORT (Efficient Stein-variational and sliced ConsistencyOptimized Representation for Temporal beliefs), a particle-based framework for capturing complex, multi-modal distributions in high-dimensional belief spaces. ESCORT extends SVGD with two key innovations: correlation-aware projections that model dependencies between state dimensions, and temporal consistency constraints that stabilize updates while preserving correlation structures. This approach retains SVGD's attractive-repulsive particle dynamics while enabling accurate modeling of intricate correlation patterns. Unlike particle filters prone to degeneracy or parametric methods with fixed representational capacity, ESCORT dynamically adapts to belief landscape complexity without resampling or restrictive distributional assumptions. We demonstrate ESCORT's effectiveness through extensive evaluations on both POMDP domains and synthetic multi-modal distributions of varying dimensionality, where it consistently outperforms state-of-theart methods in terms of belief approximation accuracy and downstream decision quality.