Goto

Collaborating Authors

 Technology


Generalized Top-k Mallows Model for Ranked Choices

Neural Information Processing Systems

The classic Mallows model is a foundational tool for modeling user preferences. However, it has limitations in capturing real-world scenarios, where users often focus only on a limited set of preferred items and are indifferent to the rest. To address this, extensions such as the top-k Mallows model have been proposed, aligning better with practical applications. In this paper, we address several challenges related to the generalized top-k Mallows model, with a focus on analyzing buyer choices. Our key contributions are: (1) a novel sampling scheme tailored to generalized top-k Mallows models, (2) an efficient algorithm for computing choice probabilities under this model, and (3) an active learning algorithm for estimating the model parameters from observed choice data. These contributions provide new tools for analysis and prediction in critical decision-making scenarios. We present a rigorous mathematical analysis for the performance of our algorithms. Furthermore, through extensive experiments on synthetic data and real-world data, we demonstrate the scalability and accuracy of our proposed methods, and we compare the predictive power of Mallows model for top-k lists compared to the simpler Multinomial Logit model.


ArchPower: Dataset for Architecture-Level Power Modeling of Modern CPUDesign

Neural Information Processing Systems

Power is the primary design objective of large-scale integrated circuits (ICs), especially for complex modern processors (i.e., CPUs). Accurate CPU power evaluation requires designers to go through the whole time-consuming IC implementation process, easily taking months. At the early design stage (e.g., architecture-level), classical power models are notoriously inaccurate. Recently, ML-based architecture-level power models have been proposed to boost accuracy, but the data availability is a severe challenge. Currently, there is no open-source dataset for this important ML application.


Parameter Dynamics of Online Machine Learning and Test-time Adaptation

Neural Information Processing Systems

Pre-trained models based on deep neural networks hold strong potential for crossdomain adaptability. However, this potential is often impeded in online machine learning (OML) settings, where the breakdown of the independent and identically distributed (i.i.d.) assumption leads to unstable adaptation. While recent advances in test-time adaptation (TTA) have addressed aspects of this challenge under unsupervised learning, most existing methods focus exclusively on unsupervised objectives and overlook the risks posed by non-i.i.d.


6294a235c0b80f0a2b224375c546c750-Paper-Conference.pdf

Neural Information Processing Systems

Text-to-Image (T2I) diffusion models [11, 41, 38, 43, 8, 7, 25], trained on large-scale datasets, have achieved remarkable success in generating high-quality, semantically aligned images from natural language prompts. While language-based control offers intuitive and flexible guidance, it often lacks the precision needed for fine-grained visual control, such as specific object positions, shapes, or scene layouts. To overcome this, recent works [19, 35, 28, 58, 27, 39, 59, 53] incorporate explicit spatial signals--like edge maps, depth maps, and segmentation masks to control diffusion models. To enable spatial control while preserving the generative quality of pre-trained diffusion models, existing methods typically employ control adapters [58, 35, 28] that inject spatial signals into a frozen T2I model. However, these adapters are usually trained independently for each spatial control task, requiring substantial computational resources and extensive labeled data for a new task. Alternatively, reusing pre-trained multi-task adapters - either directly [39, 53] or with minimal updates [59]- struggle to generalize to tasks that differ from their training distribution, and often show poor adaptability.


Causal Discovery and Inference through Next-Token Prediction

Neural Information Processing Systems

Deep neural networks have been criticized as fundamentally statistical systems that fail to capture causal structure and perform causal reasoning. Here we demonstrate that a GPT-style transformer trained for next-token prediction can simultaneously discover instances of linear Gaussian structural causal models (SCMs) and learn to answer counterfactual queries about those SCMs. First, we show that the network generalizes to counterfactual queries about SCMs for which it has seen interventional data but not any examples of counterfactual inference. The network must, thus, have successfully composed discovered causal structures with a learned counterfactual inference algorithm. Second, we decode the implicit "mental" SCM from the network's residual stream activations and manipulate it using gradient descent with predictable effects on the network's output. Our results suggest that statistical prediction may be sufficient to drive the emergence of internal causal models and causal inference capacities in deep neural networks.


Symmetry-Preserving Conformer Ensemble Networks for Molecular Representation Learning

Neural Information Processing Systems

Molecular representation learning has emerged as a promising approach for modeling molecules with deep learning in chemistry and beyond. While 3D geometric models effectively capture molecular structure, they typically process single static conformers, overlooking the inherent flexibility and dynamics of molecules. In reality, many molecular properties depend on distributions of thermodynamically accessible conformations rather than single structures. Recent works show that learning from conformer ensembles can improve molecular representations, but existing approaches either produce unphysical structures through averaging or require restrictive molecular alignment. In this paper, we propose SymmetryPreserving Conformer Ensemble networks (SPiCE), which introduces two key innovations: (1) geometric mixture-of-experts for selective processing of scalar and vector features, and (2) hierarchical ensemble encoding that combines ensemblelevel representation with cross-conformer integration. Crucially, SPiCE ensures physically meaningful representations by maintaining joint equivariance to geometric transformations of individual conformers and conformer permutations. Extensive experiments demonstrate that SPiCE consistently outperforms existing conformer ensemble methods and state-of-the-art structural aggregation models across quantum mechanical and biological property prediction tasks.


RSCC: ALarge-Scale Remote Sensing Change Caption Dataset for Disaster Events

Neural Information Processing Systems

Remote sensing is critical for disaster monitoring, yet existing datasets lack temporal image pairs and detailed textual annotations. While single-snapshot imagery dominates current resources, it fails to capture dynamic disaster impacts over time. To address this gap, we introduce the Remote Sensing Change Caption (RSCC) dataset, a large-scale benchmark comprising 62,351 pre-/post-disaster image pairs (spanning earthquakes, floods, wildfires, and more) paired with rich, human-like change captions. By bridging the temporal and semantic divide in remote sensing data, RSCC enables robust training and evaluation of vision-language models for disaster-aware bi-temporal understanding. Our results highlight RSCC's ability to facilitate detailed disaster-related analysis, paving the way for more accurate, interpretable, and scalable vision-language applications in remote sensing.


TAPEREDOFF-POLICYREINFORCE Stable and efficient reinforcement learning for LLMs

Neural Information Processing Systems

We propose a new algorithm for fine-tuning large language models using reinforcement learning. Tapered Off-Policy REINFORCE (TOPR) uses an asymmetric, tapered variant of importance sampling to speed up learning while maintaining stable learning dynamics, even without the use of KL regularization. TOPR can be applied in a fully offline fashion, allows the handling of positive and negative examples in a unified framework, and benefits from the implementational simplicity that is typical of Monte Carlo algorithms. We demonstrate the effectiveness of our approach with a series of experiments on the GSM8K and MATH reasoning benchmarks, finding performance gains for training both a model for solution generation as a generative verifier, and on a learning to search task, using the model as a query expander. We show that properly leveraging positive and negative examples alike in the off-policy regime simultaneously increases test-time accuracy and training data efficiency, all the while avoiding the "wasted inference" that comes with discarding negative examples. We find that this advantage persists over multiple iterations of training and can be amplified by dataset curation techniques, enabling us to match 70B-parameter model performance with 8B language models. As a corollary to this work, we find that REINFORCE's baseline parameter plays an important and unexpected role in defining dataset composition in the presence of negative examples, and is consequently critical in driving off-policy performance.



Evaluating LLM-Contaminated Crowdsourcing Data Without Ground Truth

Neural Information Processing Systems

The recent success of generative AI highlights the crucial role of high-quality human feedback in building trustworthy AI systems. However, the increasing use of large language models (LLMs) by crowdsourcing workers poses a significant challenge: datasets intended to reflect human input may be compromised by LLM-generated responses. Existing LLM detection approaches often rely on high-dimensional training data such as text, making them unsuitable for structured annotation tasks like multiple-choice labeling. In this work, we investigate the potential of peer prediction--a mechanism that evaluates the information within workers' responses--to mitigate LLM-assisted cheating in crowdsourcing with a focus on annotation tasks.