Offline Regularised Reinforcement Learning for Large Language Models Alignment

Open in new window