Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

Jan-19-2025, 21:09:35 GMT–Neural Information Processing Systems

We explore the methodology and theory of reward-directed generation via conditional diffusion models. Directed generation aims to generate samples with desired properties as measured by a reward function, which has broad applications in generative AI, reinforcement learning, and computational biology. We consider the common learning scenario where the dataset consists of majorly unlabeled data and a small set of data with noisy reward labels. Our approach leverages a learned reward function on the smaller data set as a pseudolabeler to label the unlabelled data. After pseudo-labelling, a conditional diffusion model (CDM) is trained on the data and samples are generated by setting a target value a as the condition in CDM.

distribution estimation and reward improvement, provable distribution estimation, reward-directed conditional diffusion, (4 more...)

Neural Information Processing Systems

Jan-19-2025, 21:09:35 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)