Offline Regularised Reinforcement Learning for Large Language Models Alignment