Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Open in new window