Block Flow: Learning Straight Flow on Data Blocks

Wang, Zibin, Ouyang, Zhiyuan, Zhang, Xiangyun

arXiv.org Artificial Intelligence 

Diffusion generative models have emerged as a compelling family of paradigms capable of modeling data distributions by stochastic differential equations (SDEs) [1, 2, 3],and they have remarkable success in many fields like images generation [4, 5], video systhesis[6, 7], audio systhesis[8], protein design[9], and so on. The generative process is defined as the temporal inversion of a forward diffusion process, wherein data is progressively transformed into noise. This approach enables training on a stationary loss function [10]. Moreover, they are not restricted by the invertibility constraint and can generate high-fidelity samples with great diversity, allowing them to be success fully applied to various datasets of unprecedented scales [11, 12]. Continuous Normalizing Flow (CNF) is defined by [13], which offer the capability to model arbitrary trajectories, encompassing those represented by diffusion processes[14]. This approach is particularly appealing as it addresses the suboptimal alignment between noise and data in diffusion models by attempting to build a straight trajectory formulation that directly connects them. With neural ordinary differential equations (ODEs), [15] propose flow-matching to train CNFs and achieve empirically observed improvements in both training efficiency and inference speed compared to traditional diffusion models. A key drawback of diffusion/flow-matching models is their high computational cost during inference, as generating a single sample (e.g., an image) requires solving an ODE or SDE using a numerical solver that repeatedly evaluates the computationally expensive neural drift function.