Goto

Collaborating Authors

 generative performance


Understanding temperature tuning in energy-based models

Fields, Peter W, Ngampruetikorn, Vudtiwat, Schwab, David J, Palmer, Stephanie E

arXiv.org Artificial Intelligence

Energy-based models trained on evolutionary data can now generate novel protein sequences with custom functions [38]. A crucial, yet poorly understood, step in these successes is the use of an artificially low sampling "temperature" to produce functional sequences from the trained model. This adjustment is often the deciding factor between generating functional enzymes and inert polypeptides. A fundamental question arises as to what necessitates temperature tuning and what it reveals about the space of functional proteins and the limits of the models trained on finite data. Temperature tuning is a broadly used heuristic across machine learning contexts, used to improve training [16, 33, 34], generalization/generative performance [14, 45, 47, 48], and energy-landscape dynamics for memory retrieval [35]. It follows the basic intuition that one can navigate the trade-off between fidelity (producing believable, high-probability outputs at low temperature) and diversity (exploring a wide range of novel outputs at high temperature). Despite its widespread use, this practice lacks a principled, quantitative explanation and has not been systematically connected to known issues of the fitting procedure--particularly how it connects to fundamental limits in the learning process, such as biases introduced by training on finite data [5, 9, 10, 21, 22, 41].


Towards Irreversible Machine Unlearning for Diffusion Models

Yuan, Xun, Zhao, Zilong, Li, Jiayu, Pasikhani, Aryan, Gope, Prosanta, Sikdar, Biplab

arXiv.org Artificial Intelligence

Diffusion models are renowned for their state-of-the-art performance in generating synthetic images. However, concerns related to safety, privacy, and copyright highlight the need for machine unlearning, which can make diffusion models forget specific training data and prevent the generation of sensitive or unwanted content. Current machine unlearning methods for diffusion models are primarily designed for conditional diffusion models and focus on unlearning specific data classes or features. Among these methods, finetuning-based machine unlearning methods are recognized for their efficiency and effectiveness, which update the parameters of pre-trained diffusion models by minimizing carefully designed loss functions. However, in this paper, we propose a novel attack named Diffusion Model Relearning Attack (DiMRA), which can reverse the finetuning-based machine unlearning methods, posing a significant vulnerability of this kind of technique. Without prior knowledge of the unlearning elements, DiMRA optimizes the unlearned diffusion model on an auxiliary dataset to reverse the unlearning, enabling the model to regenerate previously unlearned elements. To mitigate this vulnerability, we propose a novel machine unlearning method for diffusion models, termed as Diffusion Model Unlearning by Memorization (DiMUM). Unlike traditional methods that focus on forgetting, DiMUM memorizes alternative data or features to replace targeted unlearning data or features in order to prevent generating such elements. In our experiments, we demonstrate the effectiveness of DiMRA in reversing state-of-the-art finetuning-based machine unlearning methods for diffusion models, highlighting the need for more robust solutions. We extensively evaluate DiMUM, demonstrating its superior ability to preserve the generative performance of diffusion models while enhancing robustness against DiMRA.


Adaptive Margin RLHF via Preference over Preferences

Chittepu, Yaswanth, Singhal, Prasann, Durrett, Greg, Niekum, Scott

arXiv.org Artificial Intelligence

Margin-based optimization is fundamental to improving generalization and robustness in classification tasks. In the context of reward model learning from preferences within Reinforcement Learning from Human Feedback (RLHF), existing methods typically rely on no margins, fixed margins, or margins that are simplistic functions of preference ratings. However, such formulations often fail to account for the varying strengths of different preferences, for example some preferences are associated with larger margins between responses, or they rely on noisy margin information derived from ratings. We argue that modeling the strength of preferences can lead to better generalization and more faithful alignment. Furthermore, many existing methods that use adaptive margins assume access to accurate preference scores, which can be difficult for humans to provide reliably. We propose an approach that leverages preferences over preferences, that is annotations indicating which of two preferences reflects a stronger distinction. We use this ordinal signal to infer adaptive margins on a per-datapoint basis. We introduce an extension to Direct Preference Optimization (DPO), DPO-PoP, that incorporates adaptive margins from preference-over-preference supervision, enabling improved discriminative and generative performance. Empirically, our method outperforms vanilla DPO, DPO with fixed margins, and DPO with ground-truth margins on the UltraFeedback dataset. Additionally, we show that there is a tradeoff between discriminative and generative performance: improving test classification accuracy, particularly by correctly labeling weaker preferences at the expense of stronger ones, can lead to a decline in generative quality. To navigate this tradeoff, we propose two sampling strategies to gather preference-over-preference labels: one favoring discriminative performance and one favoring generative performance.


Squeezed Diffusion Models

Singh, Jyotirmai, Khanna, Samar, Burgess, James

arXiv.org Artificial Intelligence

Diffusion models typically inject isotropic Gaussian noise, disregarding structure in the data. Motivated by the way quantum squeezed states redistribute uncertainty according to the Heisenberg uncertainty principle, we introduce Squeezed Diffusion Models (SDM), which scale noise anisotropically along the principal component of the training distribution. As squeezing enhances the signal-to-noise ratio in physics, we hypothesize that scaling noise in a data-dependent manner can better assist diffusion models in learning important data features. We study two configurations: (i) a Heisenberg diffusion model that compensates the scaling on the principal axis with inverse scaling on orthogonal directions and (ii) a standard SDM variant that scales only the principal axis. Counterintuitively, on CIFAR-10/100 and CelebA-64, mild antisqueezing - i.e. increasing variance on the principal axis - consistently improves FID by up to 15% and shifts the precision-recall frontier toward higher recall. Our results demonstrate that simple, data-aware noise shaping can deliver robust generative gains without architectural changes.



We would like to thank all the reviewers for positive and constructive feedback

Neural Information Processing Systems

Reconstruction results (best seen when zoomed in). Figure 1: (a) Input on the left and reconstructed image on the right for CelebA HQ 256. We would like to thank all the reviewers for positive and constructive feedback. Reconstruction: The reconstructed images in NV AE are indistinguishable from the training images (see Figure 1(a)). GANs are perhaps less prone to this, as they may drop modes without being penalized. Is the data conditioned on all zz z's: 's in their log space, and we limit Training curves: Figure 1 in the supplementary material demonstrates training stability with spectral regularization.


Selective Underfitting in Diffusion Models

Song, Kiwhan, Kim, Jaeyeon, Chen, Sitan, Du, Yilun, Kakade, Sham, Sitzmann, Vincent

arXiv.org Artificial Intelligence

Diffusion models have emerged as the principal paradigm for generative modeling across various domains. During training, they learn the score function, which in turn is used to generate samples at inference. They raise a basic yet unsolved question: which score do they actually learn? In principle, a diffusion model that matches the empirical score in the entire data space would simply reproduce the training data, failing to generate novel samples. Recent work addresses this question by arguing that diffusion models underfit the empirical score due to training-time inductive biases. In this work, we refine this perspective, introducing the notion of selective underfitting: instead of underfitting the score everywhere, better diffusion models more accurately approximate the score in certain regions of input space, while underfitting it in others. We characterize these regions and design empirical interventions to validate our perspective. Our results establish that selective underfitting is essential for understanding diffusion models, yielding new, testable insights into their generalization and generative performance.


No Alignment Needed for Generation: Learning Linearly Separable Representations in Diffusion Models

Yun, Junno, Alçalar, Yaşar Utku, Akçakaya, Mehmet

arXiv.org Artificial Intelligence

Efficient training strategies for large-scale diffusion models have recently emphasized the importance of improving discriminative feature representations in these models. A central line of work in this direction is representation alignment with features obtained from powerful external encoders, which improves the representation quality as assessed through linear probing. Alignment-based approaches show promise but depend on large pretrained encoders, which are computationally expensive to obtain. In this work, we propose an alternative regularization for training, based on promoting the Linear SEParability (LSEP) of intermediate layer representations. LSEP eliminates the need for an auxiliary encoder and representation alignment, while incorporating linear probing directly into the network's learning dynamics rather than treating it as a simple post-hoc evaluation tool. Our results demonstrate substantial improvements in both training efficiency and generation quality on flow-based transformer architectures such as SiTs, achieving an FID of 1.46 on $256 \times 256$ ImageNet dataset.


LLM-Guided Ansätze Design for Quantum Circuit Born Machines in Financial Generative Modeling

Gujju, Yaswitha, Harang, Romain, Shibuya, Tetsuo

arXiv.org Artificial Intelligence

Quantum generative modeling using quantum circuit Born machines (QCBMs) shows promising potential for practical quantum advantage. However, discovering ansätze that are both expressive and hardware-efficient remains a key challenge, particularly on noisy intermediate-scale quantum (NISQ) devices. In this work, we introduce a prompt-based framework that leverages large language models (LLMs) to generate hardware-aware QCBM architectures. Prompts are conditioned on qubit connectivity, gate error rates, and hardware topology, while iterative feedback, including Kullback-Leibler (KL) divergence, circuit depth, and validity, is used to refine the circuits. We evaluate our method on a financial modeling task involving daily changes in Japanese government bond (JGB) interest rates. Our results show that the LLM-generated ansätze are significantly shallower and achieve superior generative performance compared to the standard baseline when executed on real IBM quantum hardware using 12 qubits. These findings demonstrate the practical utility of LLM-driven quantum architecture search and highlight a promising path toward robust, deployable generative models for near-term quantum devices.


MoE-Compression: How the Compression Error of Experts Affects the Inference Accuracy of MoE Model?

Ma, Songkai, Zhang, Zhaorui, Di, Sheng, Liu, Benben, Yu, Xiaodong, Lu, Xiaoyi, Wang, Dan

arXiv.org Artificial Intelligence

With the widespread application of Mixture of Experts (MoE) reasoning models in the field of LLM learning, efficiently serving MoE models under limited GPU memory constraints has emerged as a significant challenge. Offloading the non-activated experts to main memory has been identified as an efficient approach to address such a problem, while it brings the challenges of transferring the expert between the GPU memory and main memory. We need to explore an efficient approach to compress the expert and analyze how the compression error affects the inference performance. To bridge this gap, we propose employing error-bounded lossy compression algorithms (such as SZ3 and CuSZp) to compress non-activated experts, thereby reducing data transfer overhead during MoE inference. We conduct extensive experiments across various benchmarks and present a comprehensive analysis of how compression-induced errors in different experts affect overall inference accuracy. The results indicate that experts in the shallow layers, which are primarily responsible for the attention mechanism and the transformation of input tokens into vector representations, exhibit minimal degradation in inference accuracy when subjected to bounded errors. In contrast, errors in the middle-layer experts, which are central to model reasoning, significantly impair inference accuracy. Interestingly, introducing bounded errors in the deep-layer experts, which are mainly responsible for instruction following and output integration, can sometimes lead to improvements in inference accuracy.