On The Hidden Biases of Flow Matching Samplers

Lim, Soon Hoe

arXiv.org Machine Learning 

The main goal of generative modeling is to use finitely many samples from a distribution to construct a sampling scheme capable of generating new samples from the same distribution. Among the families of existing generative models, flow matching (FM) [23, 24] is notable for its flexibility and simplicity. Given a target probability distribution, FM utilizes a parametric model (e.g., neural network) to learn the velocity vector field that defines a deterministic, continuous transformation (a normalizing flow) and transports a source probability distribution (e.g., standard Gaussian) to the target distribution. While the population formulation of FM often exhibits appealing structure--sometimes even admitting gradient-field velocities--practical models are trained on finite datasets and therefore optimize empirical objectives. This empirical setting substantially alters the geometry of the learned velocity field and the energetic properties of the resulting sampler. These notes aim to clarify how empirical FM behaves, how it differs from its population counterpart, and what implicit biases arise in the learned sampling dynamics. From now on, we assume that all the probability distributions/measures (except the empirical distribution) of the random variables considered are absolutely continuous (i.e., they have densities with respect to the Lebesgue measure), in which case we shall abuse the notation and use the same symbol to denote both the distribution and the density. To maintain the flow of the main text, we defer discussion of related work and all proofs of the theoretical results to the appendix.