Goto

Collaborating Authors

 question


Adaptive Latent-Space Constraints in Personalized Federated Learning

Neural Information Processing Systems

Federated learning (FL) is an effective and widely used approach to training deep learning models on decentralized datasets held by distinct clients. FL also strengthens both security and privacy protections for training data. Common challenges associated with statistical heterogeneity between distributed datasets have spurred significant interest in personalized FL (pFL) methods, where models combine aspects of global learning with local modeling specific to each client's unique characteristics. This work investigates the efficacy of theoretically supported, adaptive MMD measures in pFL, primarily focusing on the Ditto framework, a state-ofthe-art technique for distributed data heterogeneity. The use of such measures significantly improves model performance across a variety of tasks, especially those with pronounced feature heterogeneity. Additional experiments demonstrate that such measures are directly applicable to other pFL techniques and yield similar improvements across a number of datasets. Finally, the results motivate the use of constraints tailored to the various kinds of heterogeneity expected in FL systems.


74d056577c164d45f713ce297a4872df-Paper-Conference.pdf

Neural Information Processing Systems

Video generative models can be regarded as world simulators due to their ability to capture dynamic, continuous changes inherent in real-world environments. These models integrate high-dimensional information across visual, temporal, spatial, and causal dimensions, enabling predictions of subjects in various status. A natural and valuable research direction is to explore whether a fully trained video generative model in high-dimensional space can effectively support lowerdimensional tasks such as controllable image generation. In this work, we propose a paradigm for video-to-image knowledge compression and task adaptation, termed Dimension-Reduction Attack (DRA-Ctrl), which utilizes the strengths of video models, including long-range context modeling and flatten full-attention, to perform various generation tasks.


MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

Neural Information Processing Systems

The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently, but it depends on heterogeneous compute and memory resources. These factors jointly affect system Cost, Accuracy, and Performance (CAP), making trade-offs inevitable. Existing benchmarks often fail to capture these trade-offs accurately, complicating practical deployment decisions. To address this, we introduce MoE-CAP, a benchmark specifically designed for MoE systems. Our analysis reveals that achieving an optimal balance across CAP is difficult with current hardware; MoE systems typically optimize two of the three dimensions at the expense of the third--a dynamic we term the MoE-CAP trade-off. To visualize this, we propose the CAP Radar Diagram. We further introduce sparsity-aware performance metrics--Sparse Memory Bandwidth Utilization (S-MBU) and Sparse Model FLOPS Utilization (S-MFU)--to enable accurate performance benchmarking of MoE systems across diverse hardware platforms and deployment scenarios.


Comparing Uniform Price and Discriminatory Multi-Unit Auctions through Regret Minimization

Neural Information Processing Systems

Repeated multi-unit auctions, where a seller allocates multiple identical items over many rounds, are common mechanisms in electricity markets and treasury auctions. We compare the two predominant formats: uniform-price and discriminatory auctions, focusing on the perspective of a single bidder learning to bid against stochastic adversaries. We characterize the learning difficulty in each format, showing that the regret scales similarly for both auction formats under both fullinformation and bandit feedback, as Θ( T)and Θ(T2/3), respectively. However, analysis beyond worst-case regret reveals structural differences: uniform-price auctions may admit faster learning rates, with regret scaling as Θ( T)in settings where discriminatory auctions remain at Θ(T2/3). Finally, we provide a specific analysis for auctions in which the other participants are symmetric and have unitdemand, and show that in these instances, a similar regret rate separation appears.


Exploring Tradeoffs through Mode Connectivity for Multi-Task Learning

Neural Information Processing Systems

Nowadays deep models are required to be versatile due to the increasing realistic needs. Multi-task learning (MTL) offers an efficient way for this purpose to learn multiple tasks simultaneously with a single model. However, prior MTL solutions often focus on resolving conflicts and imbalances during optimization, which may not outperform simple linear scalarization strategies [Xin et al., 2022]. Instead of altering the optimization trajectory, this paper leverages mode connectivity to efficiently approach the Pareto front and identify the desired trade-off point. Unlike Pareto Front Learning (PFL), which aims to align with the entire Pareto front, we focus on effectively and efficiently exploring optimal trade-offs. However, three challenges persist: (1) the low-loss path can neither fully traverse trade-offs nor align with user preference due to its randomness, (2) commonly adopted Bézier curves in mode connectivity are ill-suited to navigating the complex loss landscapes of deep models, and (3) poor scalability to large-scale task scenarios. To address these challenges, we adopt non-uniform rational B-Splines (NURBS) to model mode connectivity, allowing for more flexible and precise curve optimization. Additionally, we introduce an order-aware objective to explore task loss tradeoffs and employ a task grouping strategy to enhance scalability under massive task scenarios. Extensive experiments on key MTL datasets demonstrate that our proposed method, EXTRA(EXplore TRAde-offs), effectively identifies the desired point on the Pareto front and achieves state-of-the-art performance.


Quantization-Free Autoregressive Action Transformer

Neural Information Processing Systems

Current transformer-based imitation learning approaches introduce discrete action representations and train an autoregressive transformer decoder on the resulting latent code. However, the initial quantization breaks the continuous structure of the action space thereby limiting the capabilities of the generative model. We propose a quantization-free method instead that leverages Generative InfiniteVocabulary Transformers (GIVT) as a direct, continuous policy parametrization for autoregressive transformers.


Tracking and Understanding Object Transformations

Neural Information Processing Systems

Real-world objects frequently undergo state transformations. From an apple being cut into pieces to a butterfly emerging from its cocoon, tracking through these changes is important for understanding real-world objects and dynamics. However, existing methods often lose track of the target object after transformation, due to significant changes in object appearance. To address this limitation, we introduce the task of Track Any State: tracking objects through transformations while detecting and describing state changes, accompanied by a new benchmark dataset, VOST-TAS. To tackle this problem, we present TubeletGraph, a zero-shot system that recovers missing objects after transformation and maps out how object states are evolving over time. TubeletGraph first identifies potentially overlooked tracks, and determines whether they should be integrated based on semantic and proximity priors. Then, it reasons about the added tracks and generates a state graph describing each observed transformation. TubeletGraph achieves state-of-the-art tracking performance under transformations, while demonstrating deeper understanding of object transformations and promising capabilities in temporal grounding and semantic reasoning for complex object transformations. Code, additional results, and the benchmark dataset are available at https://tubelet-graph.github.io.


COLA: Towards Efficient Multi-Objective Reinforcement Learning with Conflict Objective Regularization in Latent Space

Neural Information Processing Systems

Many real-world control problems require continual policy adjustments to balance multiple objectives, which requires the acquisition of high-quality policies to cover diverse preferences. Multi-Objective Reinforcement Learning (MORL) provides a general framework to solve such problems. However, current MORL methods suffer from high sample complexity, primarily due to the neglect of efficient knowledge sharing and conflicts in optimization with different preferences. To this end, this paper introduces a novel framework, Conflict Objective Regularization in Latent Space (COLA). To enable efficient knowledge sharing, COLA establishes a shared latent representation space for common knowledge, which can avoid redundant learning under different preferences. Besides, COLA introduces a regularization term for the value function to mitigate the negative effects of conflicting preferences on the value function approximation, thereby improving the accuracy of value estimation. The experimental results across various multi-objective continuous control tasks demonstrate the significant superiority of COLA over the state-of-the-art MORL baselines. Code is available at https://github.com/yeshenpy/COLA.


Learning with Restricted Boltzmann Machines: Asymptotics of AMP and GD in High Dimensions

Neural Information Processing Systems

The Restricted Boltzmann Machine (RBM) is one of the simplest generative neural networks capable of learning input distributions. Despite its simplicity, the analysis of its performance in learning from the training data is only well understood in cases that essentially reduce to singular value decomposition of the data. Here, we consider the limit of a large dimension of the input space and a constant number of hidden units. In this limit, we simplify the standard RBM training objective into a form that is equivalent to the multi-index model with non-separable regularization. This opens a path to analyze training of the RBM using methods that are established for multi-index models, such as Approximate Message Passing (AMP) and its state evolution, and the analysis of Gradient Descent (GD) via the dynamical mean-field theory. We then give rigorous asymptotics of the training dynamics of RBMs on data generated by the spiked covariance model as a prototype of a structure suitable for unsupervised learning. We show in particular that RBMs reach the optimal computational weak recovery threshold, aligning with the Baik-Ben Arous-Péché (BBP) transition, in the spiked covariance model.


Riemannian Consistency Model

Neural Information Processing Systems

Consistency models are a class of generative models that enable few-step generation for diffusion and flow matching models. While consistency models have achieved promising results on Euclidean domains like images, their applications to Riemannian manifolds remain challenging due to the curved geometry. In this work, we propose the Riemannian Consistency Model (RCM), which, for the first time, enables few-step consistency modeling while respecting the intrinsic manifold constraint imposed by the Riemannian geometry. Leveraging the covariant derivative and exponential-map-based parameterization, we derive the closed-form solutions for both discrete-and continuous-time training objectives for RCM. We then demonstrate theoretical equivalence between the two variants of RCM: Riemannian consistency distillation (RCD) that relies on a teacher model to approximate the marginal vector field, and Riemannian consistency training (RCT) that utilizes the conditional vector field for training. We further propose a simplified training objective that eliminates the need for the complicated differential calculation. Finally, we provide a unique kinematics perspective for interpreting the RCM objective, offering new theoretical angles.