Goto

Collaborating Authors

 Genre



Low-Rank Graphon Learning for Networks

Neural Information Processing Systems

Graphons offer a powerful framework for modeling large-scale networks, yet estimation remains challenging. We propose a novel approach that leverages a low-rank additive representation, yielding both a low-rank connection probability matrix and a low-rank graphon-two goals rarely achieved jointly. Our method resolves identification issues and enables an efficient sequential algorithm based on subgraph counts and interpolation. We establish consistency and demonstrate strong empirical performance in terms of computational efficiency and estimation accuracy through simulations and data analysis.


ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning

Neural Information Processing Systems

Binary choices, as often used for reinforcement learning from human feedback (RLHF), convey only the direction of a preference. A person may choose apples over oranges and bananas over grapes, but which preference is stronger? Strength is crucial for decision-making under uncertainty and generalization of preference models, but hard to measure reliably. Metadata such as response times and interannotator agreement can serve as proxies for strength, but are often noisy and confounded. We propose ResponseRank to address the challenge of learning from noisy strength signals. Our method uses relative differences in proxy signals to rank responses to pairwise comparisons by their inferred preference strength. To control for systemic variation, we compare signals only locally within carefully constructed strata. This enables robust learning of utility differences consistent with strengthderived rankings while making minimal assumptions about the strength signal. Our contributions are threefold: (1) ResponseRank, a novel method that robustly learns preference strength by leveraging locally valid relative strength signals; (2) empirical evidence of improved sample efficiency and robustness across diverse tasks: synthetic preference learning (with simulated response times), language modeling (with annotator agreement), and RL control tasks (with simulated episode returns); and (3) the Pearson Distance Correlation (PDC), a novel metric that isolates cardinal utility learning from ordinal accuracy.


Conditional Forecasts and Proper Scoring Rules for Reliable and Accurate Performative Predictions

Neural Information Processing Systems

Performative predictions are forecasts which influence the outcomes they aim to predict, undermining the existence of correct forecasts and standard methods of elicitation and estimation. We show that conditioning forecasts on covariates that separate them from the outcome renders the target distribution forecast-invariant, guaranteeing well-posedness of the forecasting problem. However, even under this condition, classical proper scoring rules fail to elicit correct forecasts. We prove a general impossibility result and identify two solutions: (i) in decision-theoretic settings, elicitation of correct and incentive-compatible forecasts is possible if forecasts are separating; (ii) scoring with unbiased estimates of the divergence between the forecast and the induced distribution of the target variable yields correct forecasts. Applying these insights to parameter estimation, conditional forecasts and proper scoring rules enable performatively stable estimation of performatively correct parameters, resolving the issues raised by Perdomo et al. (2020). Our results expose fundamental limits of classical forecast evaluation and offer new tools for reliable and accurate forecasting in performative settings.


Incremental Sequence Classification with Temporal Consistency

Neural Information Processing Systems

We address the problem of incremental sequence classification, where predictions are updated as new elements in the sequence are revealed. Drawing on temporaldifference learning from reinforcement learning, we identify a temporal-consistency condition that successive predictions should satisfy. We leverage this condition to develop a novel loss function for training incremental sequence classifiers. Through a concrete example, we demonstrate that optimizing this loss can offer substantial gains in data efficiency. We apply our method to text classification tasks and show that it improves predictive accuracy over competing approaches on several benchmark datasets. We further evaluate our approach on the task of verifying large language model generations for correctness in grade-school math problems. Our results show that models trained with our method are better able to distinguish promising generations from unpromising ones after observing only a few tokens.


Wukong's 72 Transformations: High-fidelity Textured 3DMorphing via Flow Models

Neural Information Processing Systems

We present WUKONG, a novel training-free framework for high-fidelity textured 3D morphing that takes a pair of source and target prompts (image or text) as input. Unlike conventional methods--which rely on manual correspondence matching and deformation trajectory estimation (limiting generalization and requiring costly preprocessing)--WUKONG leverages the generative prior of flow-based transformers to produce high-fidelity 3D transitions with rich texture details. To ensure smooth shape transitions, we exploit the inherent continuity of flow-based generative processes and formulate morphing as an optimal transport barycenter problem. We further introduce a sequential initialization strategy to prevent abrupt geometric distortions and preserve identity coherence. For faithful texture preservation, we propose a similarity-guided semantic consistency mechanism that selectively retains high-frequency details and enables precise control over blending dynamics. This empowers WUKONG to support both global texture transitions and identitypreserving texture morphing, catering to diverse generation needs. Extensive quantitative and qualitative evaluations demonstrate that WUKONG significantly outper-Corresponding author.


Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization

Neural Information Processing Systems

We consider the problem of conformal prediction under covariate shift. Given labeled data from a source domain and unlabeled data from a covariate shifted target domain, we seek to construct prediction sets with valid marginal coverage in the target domain. Most existing methods require estimating the unknown likelihood ratio function, which can be prohibitive for high-dimensional data such as images. To address this challenge, we introduce the likelihood ratio regularized quantile regression (LR-QR) algorithm, which combines the pinball loss with a novel choice of regularization in order to construct a threshold function without directly estimating the unknown likelihood ratio. We show that the LR-QR method has coverage at the desired level in the target domain, up to a small error term that we can control. Our proofs draw on a novel analysis of coverage via stability bounds from learning theory. Our experiments demonstrate that the LR-QR algorithm outperforms existing methods on high-dimensional prediction tasks, including a regression task for the Communities and Crime dataset, an image classification task from the WILDS repository, and an LLM question-answering task on the MMLU benchmark.


ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation

Neural Information Processing Systems

While humans effortlessly draw visual objects and shapes by adaptively allocating attention based on their complexity, existing multimodal large language models (MLLMs) remain constrained by rigid token representations. Bridging this gap, we propose ALTo, an adaptive-length tokenizer for autoregressive mask generation. To achieve this, a novel token length predictor is designed, along with a length regularization term and a differentiable token chunking strategy.


Cameras as Relative Positional Encoding

Neural Information Processing Systems

Transformers are increasingly prevalent for multiview computer vision tasks, where geometric relationships between viewpoints are critical for 3D perception. To leverage these relationships, multiview transformers must use camera geometry to ground visual tokens in 3D space. In this work, we compare techniques for conditioning transformers on cameras: token-level raymap encodings, attentionlevel relative pose encodings, and a new relative encoding--Projective Positional Encoding (PRoPE)--that captures complete camera frustums, both intrinsics and extrinsics, as a relative positional encoding. Our experiments begin by showing how relative conditioning methods improve performance in feedforward novel view synthesis, with further gains from PRoPE. This holds across settings: scenes with both shared and varying intrinsics, when combining token-and attention-level conditioning, and for generalization to inputs with out-of-distribution sequence lengths and camera intrinsics. We then verify that these benefits persist for different tasks, stereo depth estimation and discriminative spatial cognition, as well as larger model sizes. Code is available on our project webpage2.


Pseudo-Riemannian Graph Transformer

Neural Information Processing Systems

Many real-world graphs exhibit diverse and complex topological structures that are not well captured by geometric manifolds with uniform global curvature, such as hyperbolic or spherical spaces. Recently, there has been growing interest in embedding graphs into pseudo-Riemannian manifolds, which generalize both hyperbolic and spherical geometries. However, existing approaches face three significant limitations, including the ineffective pseudo-Riemannain framework, the shallow architectures, and the absence of clear guideline for selecting suitable pseudo-Riemannian manifolds. To address these issues, we introduce a novel diffeomorphic framework for graph embedding that aligns with the nature of pseudo-Riemannian manifolds. Subsequently, we propose the pseudo-Riemannian Graph Transformer for learning representations of complex graph structures. Our diffeomorphic framework in pseudo-Riemannian geometry enables the principled definitions of core transformer components, including linear attention, residual connection, and layer normalization. Finally, we develop a lightweight space searching algorithm to automatically identify the most suitable pseudo-Riemannian manifold for an input graph. Extensive experiments on diverse real-world graphs demonstrate that our model consistently outperforms other baselines in both node classification and link prediction tasks.