Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement
–Neural Information Processing Systems
We explore the methodology and theory of reward-directed generation via conditional diffusion models. Directed generation aims to generate samples with desired properties as measured by a reward function, which has broad applications in generative AI, reinforcement learning, and computational biology. We consider the common learning scenario where the dataset consists of majorly unlabeled data and a small set of data with noisy reward labels. Our approach leverages a learned reward function on the smaller data set as a pseudolabeler to label the unlabelled data. After pseudo-labelling, a conditional diffusion model (CDM) is trained on the data and samples are generated by setting a target value a as the condition in CDM.
Neural Information Processing Systems
Jan-19-2025, 21:09:35 GMT
- Technology: