Aligning Language Models Using Follow-up Likelihood as Reward Signal

Open in new window