Goto

Collaborating Authors

 Technology


Mechanism Design for LLM Fine-tuning with Multiple Reward Models

Neural Information Processing Systems

Fine-tuning large language models (LLMs) to aggregate multiple preferences has attracted considerable research attention. With aggregation algorithms advancing, a potential economic scenario arises where fine-tuning services are provided to agents with different preferences. In this context, agents may benefit from strategically misreporting their preferences, but this could harm the aggregation performance. This paper addresses such incentive issues by framing it as a mechanism design problem: an LLM provider determines the fine-tuning objective (training rule) and the pricing scheme (payment rule) for agents. We primarily focus on training rules that maximize social welfare subject to certain regularizations, referred to as SW-Max rules.


Characterizing the Expressivity of Fixed-Precision Transformer Language Models

Neural Information Processing Systems

Transformer-based language models (LMs) have achieved widespread empirical success, but their theoretical expressive power remains only partially understood. In this work, we analyze a restricted idealization of fixed-precision transformers with strict future masking, soft attention, and no positional encodings. We establish that this class of models is exactly as expressive as a specific fragment of linear temporal logic that contains only a single temporal operator: the $\texttt{past}$ operator. We further connect this fragment to established classes in formal language theory, automata theory, and algebra, yielding a unified framework for understanding transformer expressivity under this idealization. Finally, we present empirical results that align closely with our theory: transformers trained on languages within their characterized expressive capacity generalize reliably across sequence lengths, while they consistently fail to generalize on languages beyond it.


OmniZoom: A Universal Plug-and-Play Paradigm for Cross-Device Smooth Zoom Interpolation

Neural Information Processing Systems

Dual-camera smartphones suffer from geometric and photometric inconsistencies during zoom transitions, primarily due to disparities in intrinsic/extrinsic parameters and divergent image processing pipelines between the two cameras. Existing interpolation methods struggle to effectively address this issue, constrained by the lack of ground-truth datasets and motion ambiguity in dynamic scenarios. To overcome these challenges, we propose OmniZoom, a universal plug-and-play paradigm for cross-device smooth zoom interpolation. Specifically, we present a novel cross-device virtual data generation method utilizing 3D Gaussian Splatting. This method tackles data scarcity by decoupling geometric features via spatial transition modeling and correcting photometric variations with dynamic color adaptation. It is further enhanced by cross-domain consistency learning for device-agnostic semantic alignment.


Learn and Ensemble Bridge Adapters for Multi-domain Task Incremental Learning

Neural Information Processing Systems

Multi-domain task incremental learning (MTIL) demands models to master domain-specific expertise while preserving generalization capabilities. Inspired by human lifelong learning, which relies on revisiting, aligning, and integrating past experiences, we propose a Learning and Ensembling Bridge Adapters (LEBA) framework. To facilitate cohesive knowledge transfer across domains, specifically, we propose a continuous-domain bridge adaptation module, leveraging the distribution transfer capabilities of Schrödinger bridge for stable progressive learning. To strengthen memory consolidation, we further propose a progressive knowledge ensemble strategy that revisits past task representations via a diffusion model and dynamically integrates historical adapters. For efficiency, LEBA maintains a compact adapter pool through similarity-based selection and employs learnable weights to align replayed samples with current task semantics. Together, these components effectively mitigate catastrophic forgetting and enhance generalization across tasks.


LC-Opt: Benchmarking Reinforcement Learning and Agentic AI for End-to-End Liquid Cooling Optimization in Data Centers

Neural Information Processing Systems

Liquid cooling is critical for thermal management in high-density data centers with the rising AI workloads. However, machine learning-based controllers are essential to unlock greater energy efficiency and reliability, promoting sustainability.


Listwise Preference Diffusion Optimization for User Behavior Trajectories Prediction

Neural Information Processing Systems

Forecasting multi-step user behavior trajectories requires reasoning over structured preferences across future actions, a challenge overlooked by traditional sequential recommendation. This problem is critical for applications such as personalized commerce and adaptive content delivery, where anticipating a user's complete action sequence enhances both satisfaction and business outcomes. We identify an essential limitation of existing paradigms: their inability to capture global, listwise dependencies among sequence items. To address this, we formulate User Behavior Trajectory Prediction (UBTP) as a new task setting that explicitly models longterm user preferences. We introduce Listwise Preference Diffusion Optimization (LPDO), a diffusion-based training framework that directly optimizes structured preferences over entire item sequences. LPDO incorporates a Plackett-Luce supervision signal and derives a tight variational lower bound aligned with listwise ranking likelihoods, enabling coherent preference generation across denoising steps and overcoming the independent-token assumption of prior diffusion methods. To rigorously evaluate multi-step prediction quality, we propose the task-specific metric: Sequential Match (SeqMatch), which measures exact trajectory agreement, and adopt Perplexity (PPL), which assesses probabilistic fidelity. Extensive experiments on real-world user behavior benchmarks demonstrate that LPDO consistently outperforms state-of-the-art baselines, establishing a new benchmark for structured preference learning with diffusion models.


Think Only When You Need with Large Hybrid-Reasoning Models

Neural Information Processing Systems

Recent Large Reasoning Models (LRMs) have shown substantially improved reasoning capabilities over traditional Large Language Models (LLMs) by incorporating extended thinking processes prior to producing final responses. However, excessively lengthy thinking introduces substantial overhead in terms of token consumption and latency, which is unnecessary for simple queries. In this work, we introduce Large Hybrid-Reasoning Models (LHRMs), the first kind of model capable of adaptively determining whether to perform reasoning based on the contextual information of user queries. To achieve this, we propose a two-stage training pipeline comprising Hybrid Fine-Tuning (HFT) as a cold start, followed by online reinforcement learning with the proposed Hybrid Group Policy Optimization (HGPO) to implicitly learn to select the appropriate reasoning mode. Furthermore, we introduce a metric called Hybrid Accuracy to quantitatively assess the model's capability for hybrid reasoning. Extensive experimental results show that LHRMs can adaptively perform hybrid reasoning on queries of varying difficulty and type. It outperforms existing LRMs and LLMs in reasoning and general capabilities while significantly improving efficiency. Together, our work advocates for a reconsideration of the appropriate use of extended reasoning processes and provides a solid starting point for building hybrid reasoning systems.


Vinci: Deep Thinking in Text-to-Image Generation using Unified Model with Reinforcement Learning

Neural Information Processing Systems

With the continuous development of large language models and reasoning chain technologies, the potential of deep reasoning based on reinforcement learning has shown remarkable promise in multi-task scenarios. However, existing unified models have yet to achieve end-to-end integration in image generation and understanding tasks, limiting the model's self-reflection ability and the realization of cross-modal reasoning chains. To address this, we propose Vinic, a novel framework designed to enable interleaved image generation and understanding through deep reasoning capabilities. We leverage a small amount of multimodal chain-of-thought (MCoT) data for cold-start and employ reinforcement learning to guide the integration of image generation and understanding tasks. Additionally, we introduce a momentum-based reward function, which dynamically adjusts the reward distribution by considering historical improvements, ensuring the stability of the model across multiple generations. Experimental results demonstrate that integrating MCoT can achieve a +22% improvement over the base model on Geneval, effectively enhancing both image generation quality and instruction alignment capabilities.


Inference with correlated priors using sisters cells

Neural Information Processing Systems

A common view of sensory processing is as probabilistic inference of latent causes from receptor activations. Standard approaches often assume these causes are \textit{a priori} independent, yet real-world generative factors are typically correlated. Representing such structured priors in neural systems poses architectural challenges, particularly when direct interactions between units representing latent causes are biologically implausible or computationally expensive. Inspired by the architecture of the olfactory bulb, we propose a novel circuit motif that enables inference with correlated priors without requiring direct interactions among latent cause units. The key insight lies in using \textit{sister cells}: neurons receiving shared receptor input but connected differently to local interneurons.


Superposition Yields Robust Neural Scaling

Neural Information Processing Systems

The success of today's large language models (LLMs) depends on the observation that larger models perform better. However, the origin of this neural scaling law, that loss decreases as a power law with model size, remains unclear. We propose that representation superposition, meaning that LLMs represent more features than they have dimensions, can be a key contributor to loss and cause neural scaling. Based on Anthropic's toy model, we use weight decay to control the degree of superposition, allowing us to systematically study how loss scales with model size. When superposition is weak, the loss follows a power law only if data feature frequencies are power-law distributed.