Goto

Collaborating Authors

 Genre


2526c5e8110bc6bc8b462ba95198161e-Paper-Conference.pdf

Neural Information Processing Systems

After pre-training, large language models are aligned with human preferences based on pairwise comparisons. State-of-the-art alignment methods (such as PPO-based RLHF and DPO) are built on the assumption of aligning with a single preference model, despite being deployed in settings where users have diverse preferences. As a result, it is not even clear that these alignment methods produce models that satisfy users on average -- a minimal requirement for pluralistic alignment. Drawing on social choice theory and modeling users' comparisons through individual BradleyTerry (BT) models, we introduce an alignment method's distortion: the worst-case ratio between the optimal achievable average utility, and the average utility of the learned policy. The notion of distortion helps draw sharp distinctions between alignment methods: Nash Learning from Human Feedback achieves the minimax optimal distortion of (12+o(1)) ฮฒ (for the BT temperature ฮฒ), robustly across utility distributions, distributions of comparison pairs, and permissible KL divergences from the reference policy. RLHF and DPO, by contrast, suffer (1 o(1)) ฮฒ distortion already without a KL constraint, and eโ„ฆ(ฮฒ) or even unbounded distortion in the full setting, depending on how comparison pairs are sampled.



Planning with Quantized Opponent Models

Neural Information Processing Systems

Planning under opponent uncertainty is a fundamental challenge in multi-agent environments, where an agent must act while inferring the hidden policies of its opponents. Existing type-based methods rely on manually defined behavior classes and struggle to scale, while model-free approaches are sample-inefficient and lack a principled way to incorporate uncertainty into planning. We propose Quantized Opponent Models (QOM), which learn a compact catalog of opponent types via a quantized autoencoder and maintain a Bayesian belief over these types online. This posterior supports both a belief-weighted meta-policy and a Monte-Carlo planning algorithm that directly integrates uncertainty, enabling real-time belief updates and focused exploration. Experiments show that QOM achieves superior performance with lower search cost, offering a tractable and effective solution for belief-aware planning.


Towards Provable Emergence of In-Context Reinforcement Learning

Neural Information Processing Systems

Typically, a modern reinforcement learning (RL) agent solves a task by updating its neural network parameters to adapt its policy to the task. Recently, it has been observed that some RL agents can solve a wide range of new out-of-distribution tasks without parameter updates after pretraining on some task distribution. When evaluated in a new task, instead of making parameter updates, the pretrained agent conditions its policy on additional input called the context, e.g., the agent's interaction history in the new task. The agent's performance increases as the information in the context increases, with the agent's parameters fixed. This phenomenon is typically called in-context RL (ICRL). The pretrained parameters of the agent network enable the remarkable ICRL phenomenon.


MMPB: It's Time for Multi-Modal Personalization

Neural Information Processing Systems

Visual personalization is essential in user-facing AI systems such as smart homes and healthcare, where aligning model behavior with user-centric concepts is critical. However, recent large Vision-Language Models (VLMs), despite their broad applicability, remain underexplored in their ability to adapt to individual users. In this paper, we introduce MMPB, the first extensive benchmark for evaluating VLMs on personalization. MMPB comprises 10k image-query pairs and includes 111 personalizable concepts across four categories: humans, animals, objects, and characters, with the human category enriched with preference-grounded queries.


WhAM: Towards ATranslative Model of Sperm Whale Vocalization

Neural Information Processing Systems

Sperm whales communicate in short sequences of clicks known as codas. We present WhAM (Whale Acoustics Model), the first transformer-based model capable of generating synthetic sperm whale codas from any audio prompt. WhAM is built by finetuning VampNet, a masked acoustic token model pretrained on musical audio, using 10k coda recordings collected over the past two decades. Through iterative masked token prediction, WhAM generates high-fidelity synthetic codas that preserve key acoustic features of the source recordings. We evaluate WhAM's synthetic codas using Frรฉchet Audio Distance and through perceptual studies with expert marine biologists. On downstream classification tasks including rhythm, social unit, and vowel classification, WhAM's learned representations achieve strong performance, despite being trained for generation rather than classification.


SPINT: Spatial Permutation-Invariant Neural Transformer for Consistent Intracortical Motor Decoding

Neural Information Processing Systems

Intracortical Brain-Computer Interfaces (iBCI) decode behavior from neural population activity to restore motor functions and communication abilities in individuals with motor impairments. A central challenge for long-term iBCI deployment is the nonstationarity of neural recordings, where the composition and tuning profiles of the recorded populations are unstable across recording sessions. Existing approaches attempt to address this issue by explicit alignment techniques; however, they rely on fixed neural identities and require test-time labels or parameter updates, limiting their generalization across sessions and imposing additional computational burden during deployment. In this work, we address the problem of cross-session nonstationarity in long-term iBCI systems and introduce SPINT a Spatial Permutation-Invariant Neural Transformer framework for behavioral decoding that operates directly on unordered sets of neural units. Central to our approach is a novel context-dependent positional embedding scheme that dynamically infers unit-specific identities, enabling flexible generalization across recording sessions. SPINT supports inference on variable-size populations and allows fewshot, gradient-free adaptation using a small amount of unlabeled data from the test session. We evaluate SPINT on three multi-session datasets from the FALCON Benchmark, covering continuous motor decoding tasks in human and non-human primates. SPINT demonstrates robust cross-session generalization, outperforming existing zero-shot and few-shot unsupervised baselines while eliminating the need for test-time alignment and fine-tuning. Our work contributes an initial step toward a robust and scalable neural decoding framework for long-term iBCI applications.


OpenMMEgo: Enhancing Egocentric Understanding for LMMs with Open Weights and Data

Neural Information Processing Systems

Recent advances in large multimodal models have significantly advanced video comprehension, yet their performance remains limited in first-person scenarios. The interactive nature of egocentric videos is critical for applications like embodied intelligence, but introduces complex visual contexts that conventional models struggle to capture. To bridge this gap, we introduce OpenMMEgo with innovations across three dimensions: data, model, and training strategy. To provide rich spatiotemporal visual knowledge, we curate a large-scale, high-quality dataset named OME10M, comprising over 8.2M egocentric video QA pairs synthesized from Ego4D series. We also establish OMEBench, a comprehensive benchmark for rigorous egocentric understanding assessment. To alleviate the frequent viewpoint shifts inherent in egocentric videos, we implement semantic-aware visual token compression. Further, a curriculum learning strategy is complemented to foster stable learning across various data complexities. OpenMMEgo consistently improves the performance of LMMs on egocentric benchmarks without sacrificing general video understanding performance.


NFIG: Multi-Scale Autoregressive Image Generation via Frequency Ordering

Neural Information Processing Systems

Autoregressive models have achieved significant success in image generation. However, unlike the inherent hierarchical structure of image information in the spectral domain, standard autoregressive methods typically generate pixels sequentially in a fixed spatial order. To better leverage this spectral hierarchy, we introduce NextFrequency Image Generation (NFIG). NFIG is a novel framework that decomposes the image generation process into multiple frequency-guided stages.


Nemotron-CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Neural Information Processing Systems

Pre-training datasets are typically collected from web content and lack inherent domain divisions. For instance, widely used datasets like Common Crawl do not include explicit domain labels, while manually curating labeled datasets such as The Pile is labor-intensive. Consequently, identifying an optimal pre-training data mixture remains a challenging problem, despite its significant benefits for pre-training performance. To address these challenges, we propose CLusteringbased Iterative Data Mixture Bootstrapping (Nemotron-CLIMB), an automated framework that discovers, evaluates, and refines data mixtures in a pre-training setting. Specifically, Nemotron-CLIMB embeds and clusters large-scale datasets in a semantic space and then iteratively searches for optimal mixtures using a smaller proxy model and a predictor. This strategy enables effective domain adaptation without relying solely on curated data. When continuously trained on 400B tokens with this mixture, our 1B model exceeds the state-of-the-art Llama-3.2-1B by 2.0%. Moreover, we observe that optimizing for a specific domain (e.g., Social Sciences) yields a 5% improvement over random sampling.