Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO