Understanding Reference Policies in Direct Preference Optimization