Online Preference Alignment for Language Models via Count-based Exploration