Plotting


Accelerating Diffusion Models with Parallel Sampling: Inference at Sub-Linear Time Complexity Haoxuan Chen

Neural Information Processing Systems

Diffusion models have become a leading method for generative modeling of both image and scientific data. As these models are costly to train and evaluate, reducing the inference cost for diffusion models remains a major goal. Inspired by the recent empirical success in accelerating diffusion models via the parallel sampling technique [ 1 ], we propose to divide the sampling process into O (1) blocks with parallelizable Picard iterations within each block. Rigorous theoretical analysis reveals that our algorithm achieves null O (poly log d) overall time complexity, marking the first implementation with provable sub-linear complexity w.r .t. the data dimension d . Our analysis is based on a generalized version of Girsanov's theorem and is compatible with both the SDE and probability flow ODE implementations. Our results shed light on the potential of fast and efficient sampling of high-dimensional data on fast-evolving modern large-memory GPU clusters.




Privacy Auditing with One (1) Training Run Thomas Steinke Matthew Jagielski

Neural Information Processing Systems

We propose a scheme for auditing differentially private machine learning systems with a single training run. This exploits the parallelism of being able to add or remove multiple training examples independently. We analyze this using the connection between differential privacy and statistical generalization, which avoids the cost of group privacy. Our auditing scheme requires minimal assumptions about the algorithm and can be applied in the black-box or white-box setting. We demonstrate the effectiveness of our framework by applying it to DP-SGD, where we can achieve meaningful empirical privacy lower bounds by training only one model. In contrast, standard methods would require training hundreds of models.


Uniform Last-Iterate Guarantee for Bandits and Reinforcement Learning

Neural Information Processing Systems

Existing metrics for reinforcement learning (RL) such as regret, PAC bounds, or uniform-PAC [Dann et al., 2017], typically evaluate the cumulative performance, while allowing the agent to play an arbitrarily bad policy at any finite time t. Such a behavior can be highly detrimental in high-stakes applications. This paper introduces a stronger metric, uniform last-iterate (ULI) guarantee, capturing both cumulative and instantaneous performance of RL algorithms. Specifically, ULI characterizes the instantaneous performance by ensuring that the per-round suboptimality of the played policy is bounded by a function, monotonically decreasing w.r.t.


Post-hoc Estimators for Learning to Defer to an Expert

Neural Information Processing Systems

Many practical settings allow a classifier to defer predictions to one or more costly experts. For example, the learning to defer paradigm allows a classifier to defer to a human expert, at some monetary cost. Similarly, the adaptive inference paradigm allows a base model to defer to one or more large models, at some computational cost. The goal in these settings is to learn classification and deferral mechanisms to optimise a suitable accuracy-cost tradeo. To achieve this, a central issue studied in prior work is the design of a coherent loss function for both mechanisms. In this work, we demonstrate that existing losses can underfit the training set when there is a non-trivial deferral cost, owing to an implicit application of a high level of label smoothing. To resolve this, we propose two post-hoc estimators that fit a deferral function on top of a base model, either by threshold correction, or by learning when the base model's error rate exceeds the cost of deferring to the expert. Both approaches are equipped with theoretical guarantees, and empirically yield e ective accuracy-cost tradeo s on learning to defer and adaptive inference benchmarks.


CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models Junho Kim Hyun Jun Kim Yeon Ju Kim Yong Man Ro

Neural Information Processing Systems

Large Multi-modal Models (LMMs) have recently demonstrated remarkable abilities in visual context understanding and coherent response generation. However, alongside these advancements, the issue of hallucinations has emerged as a significant challenge, producing erroneous responses that are unrelated to the visual contents. In this paper, we introduce a novel contrastive-based decoding method, COuntering DEscription Contrastive Decoding (CODE), which leverages selfgenerated descriptions as contrasting references during the decoding phase of LMMs to address hallucination issues. CODE utilizes the comprehensive descriptions from model itself as visual counterpart to correct and improve response alignment with actual visual content. By dynamically adjusting the information flow and distribution of next-token predictions in the LMM's vocabulary, CODE enhances the coherence and informativeness of generated responses. Extensive experiments demonstrate that our method significantly reduces hallucinations and improves cross-modal consistency across various benchmarks and cutting-edge LMMs. Our method provides a simple yet effective decoding strategy that can be integrated to existing LMM frameworks without additional training.