TFG-Flow: Training-free Guidance in Multimodal Generative Flow

Lin, Haowei, Li, Shanda, Ye, Haotian, Yang, Yiming, Ermon, Stefano, Liang, Yitao, Ma, Jianzhu

arXiv.org Artificial Intelligence 

Given an unconditional generative model and a predictor for a target property (e.g., a classifier), the goal of training-free guidance is to generate samples with desirable target properties without additional training. As a highly efficient technique for steering generative models toward flexible outcomes, training-free guidance has gained increasing attention in diffusion models. However, existing methods only handle data in continuous spaces, while many scientific applications involve both continuous and discrete data (referred to as multimodality). Another emerging trend is the growing use of the simple and general flow matching framework in building generative foundation models, where guided generation remains under-explored. To address this, we introduce TFG-Flow, a novel training-free guidance method for multimodal generative flow. TFG-Flow addresses the curse-of-dimensionality while maintaining the property of unbiased sampling in guiding discrete variables. We validate TFG-Flow on four molecular design tasks and show that TFG-Flow has great potential in drug design by generating molecules with desired properties. Recent advancements in generative foundation models have demonstrated their increasing power across a wide range of domains (Reid et al., 2024; Achiam et al., 2023; Abramson et al., 2024). In particular, diffusion-based foundation models, such as Stable Diffusion (Esser et al., 2024) and SORA (Brooks et al., 2024) have achieved significant success, catalyzing a new wave of applications in areas such as art and science. As these models become more prevalent, a critical question arises: how can we steer these foundation models to achieve specific properties during inference time? One promising direction is using classifier-based guidance (Dhariwal & Nichol, 2021) or classifierfree guidance (Ho & Salimans, 2022), which typically necessitate training a specialized model for each conditioning signal (e.g., a noise-conditional classifier or a text-conditional denoiser). This resource-intensive and time-consuming process greatly limits their applicability. Recently, there has been growing interest in training-free guidance for diffusion models, which allows users to steer the generation process using an off-the-shelf differentiable target predictor without requiring additional model training (Ye et al., 2024). A target predictor can be any classifier, loss, or energy function used to score the quality of the generated samples. Training-free guidance offers a flexible and efficient means of customizing generation, holding the potential to transform the field of generative AI.