Adversarial Preference Optimization: Enhancing Your Alignment via RM-LLM Game