sfbd
SFBD Flow: A Continuous-Optimization Framework for Training Diffusion Models with Noisy Samples
Lu, Haoye, Lo, Darren, Yu, Yaoliang
Diffusion models achieve strong generative performance but often rely on large datasets that may include sensitive content. This challenge is compounded by the models' tendency to memorize training data, raising privacy concerns. SFBD (Lu et al., 2025) addresses this by training on corrupted data and using limited clean samples to capture local structure and improve convergence. However, its iterative denoising and fine-tuning loop requires manual coordination, making it burdensome to implement. We reinterpret SFBD as an alternating projection algorithm and introduce a continuous variant, SFBD flow, that removes the need for alternating steps. We further show its connection to consistency constraint-based methods, and demonstrate that its practical instantiation, Online SFBD, consistently outperforms strong baselines across benchmarks.
Stochastic Forward-Backward Deconvolution: Training Diffusion Models with Finite Noisy Datasets
Lu, Haoye, Wu, Qifan, Yu, Yaoliang
Recent diffusion-based generative models achieve remarkable results by training on massive datasets, yet this practice raises concerns about memorization and copyright infringement. A proposed remedy is to train exclusively on noisy data with potential copyright issues, ensuring the model never observes original content. However, through the lens of deconvolution theory, we show that although it is theoretically feasible to learn the data distribution from noisy samples, the practical challenge of collecting sufficient samples makes successful learning nearly unattainable. To overcome this limitation, we propose to pretrain the model with a small fraction of clean data to guide the deconvolution process. Combined with our Stochastic Forward--Backward Deconvolution (SFBD) method, we attain an FID of $6.31$ on CIFAR-10 with just $4\%$ clean images (and $3.58$ with $10\%$). Theoretically, we prove that SFBD guides the model to learn the true data distribution. The result also highlights the importance of pretraining on limited but clean data or the alternative from similar datasets. Empirical studies further support these findings and offer additional insights.
Single-Frontier Bidirectional Search
Moldenhauer, Carsten (Universitat zu Berlin) | Felner, Ariel (Ben-Gurion University) | Sturtevant, Nathan (University of Alberta) | Schaeffer, Jonathan (University of Alberta)
We introduce a new bidirectional search algorithm, Single-Frontier Bidirectional Search (SFBDS). Unlike traditional BDS which keeps two frontiers, SFBDS uses a single frontier. At a particular node we can decide to search from start to goal or from goal to start, choosing the direction with the highest potential for minimizing the total work done. We provide theoretical analysis that explains when SFBDS will work validated by experimental results.
Single-Frontier Bidirectional Search
Felner, Ariel (Ben-Gurion univeristy) | Moldenhauer, Carsten (University of Berlim) | Sturtevant, Nathan (University of Alberta) | Schaeffer, Jonathan (University of Alberta)
On the surface, bidirectional search (BDS) is an attractive idea with the potential for significant asymptotic reductions in search effort. However, the results in practice often fall far short of expectations. We introduce a new bidirectional search algorithm, Single-Frontier Bidirectional Searc (SFBDS). Unlike traditional BDS which keeps two frontiers, SFBDS uses a single frontier. Each node in the tree can be seen as an independent task of finding the shortest path between the current start and current goal. At a particular node we can decide to search from start to goal or from goal to start, choosing the direction with the highest potential for minimizing the total work done. Theoretical results give insights as to when this approach will work and experimental data validates the algorithm for a broad range of domains.