Uncertainty-Penalized Direct Preference Optimization