Saif, A F M
A First-order Generative Bilevel Optimization Framework for Diffusion Models
Xiao, Quan, Yuan, Hui, Saif, A F M, Liu, Gaowen, Kompella, Ramana, Wang, Mengdi, Chen, Tianyi
Diffusion models, which iteratively denoise data samples to synthesize high-quality outputs, have achieved empirical success across domains. However, optimizing these models for downstream tasks often involves nested bilevel structures, such as tuning hyperparameters for fine-tuning tasks or noise schedules in training dynamics, where traditional bilevel methods fail due to the infinite-dimensional probability space and prohibitive sampling costs. We formalize this challenge as a generative bilevel optimization problem and address two key scenarios: (1) fine-tuning pre-trained models via an inference-only lower-level solver paired with a sample-efficient gradient estimator for the upper level, and (2) training diffusion models from scratch with noise schedule optimization by reparameterizing the lower-level problem and designing a computationally tractable gradient estimator. Our first-order bilevel framework overcomes the incompatibility of conventional bilevel methods with diffusion processes, offering theoretical grounding and computational practicality. Experiments demonstrate that our method outperforms existing fine-tuning and hyperparameter search baselines.
Bilevel Joint Unsupervised and Supervised Training for Automatic Speech Recognition
Cui, Xiaodong, Saif, A F M, Lu, Songtao, Chen, Lisha, Chen, Tianyi, Kingsbury, Brian, Saon, George
In this paper, we propose a bilevel joint unsupervised and supervised training (BL-JUST) framework for automatic speech recognition. Compared to the conventional pre-training and fine-tuning strategy which is a disconnected two-stage process, BL-JUST tries to optimize an acoustic model such that it simultaneously minimizes both the unsupervised and supervised loss functions. Because BL-JUST seeks matched local optima of both loss functions, acoustic representations learned by the acoustic model strike a good balance between being generic and task-specific. We solve the BL-JUST problem using penalty-based bilevel gradient descent and evaluate the trained deep neural network acoustic models on various datasets with a variety of architectures and loss functions. We show that BL-JUST can outperform the widely-used pre-training and fine-tuning strategy and some other popular semi-supervised techniques.
Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization
Saif, A F M, Cui, Xiaodong, Shen, Han, Lu, Songtao, Kingsbury, Brian, Chen, Tianyi
BL-JUST employs a lower and upper level optimization In general, bilevel optimization problems are optimization problems with an unsupervised loss and a supervised loss respectively, where the feasible set is determined (in part) using the solution leveraging recent advances in penalty-based bilevel optimization to set of a second optimization problem [10]. Determining the feasible solve this challenging ASR problem with affordable complexity and set is generally called the lower-level problem and the second parametric rigorous convergence guarantees. To evaluate BL-JUST, extensive optimization problem is called the upper-level problem [31, 29].