Human Alignment of Large Language Models through Online Preference Optimisation