Active Learning for Direct Preference Optimization