Post-Training Large Language Models via Reinforcement Learning from Self-Feedback

Open in new window